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Methods of Diagnosis of Colorectal Cancer, Compositions and Methods of 
Screening for Colorectal Cancer Modulators 

CROSS-REFERENCES TO RELATED APPLICATIONS 

[01] This application is a continuation in part of US Patent Application 

USSN 09/663,733 filed September 15, 2000, which is incorporated herein by reference in its 

entirety. 



FIELD OF THE INVENTION 
[02] The invention relates to the identification of expression profiles and the 
nucleic acids involved in colorectal cancer, and to the use of such expression profiles and 
nucleic acids in diagnosis and prognosis of colorectal cancer. The invention further relates to 
methods for identifying and using candidate agents and/or targets which modulate colorectal 
cancer. 

BACKGROUND OF THE INVENTION 
[03] Cancer of the colon and/or rectum (referred to as "colorectal cancer") 
are significant in Western populations and particularly in the United States. Cancers of the 
colon and rectum occur in both men and women most commonly after the age of 50. These 
develop as the result of a pathologic transformation of normal colon epithelium to an invasive 
cancer. There have been a number of recently characterized genetic alterations that have 
been implicated in colorectal cancer, including mutations in two classes of genes, tumor- 
suppressor genes and proto-oncogenes, with recent work suggesting that mutations in DNA 
repair genes may also be involved in tumorigenesis. For example, inactivating mutations of 
both alleles of the adenomatous polyposis coli (APC) gene, a tumor suppressor gene, appears 
to be one of the earliest events in colorectal cancer, and may even be the initiating event. 
Other genes implicated in colorectal cancer include the MCC gene, the p53 gene, the DCC 
(deleted in colorectal carcinoma) gene and other chromosome 18q genes, and genes in the 
TGF-P signaling pathway. For a review, see Molecular Biology of Colorectal Cancer^ pp. 
238-299, in Curr. Probl Cancer, Sept/Oct 1997; see also Willams, Colorectal Cancer 



(1996); Kinsella & Schofield, Colorectal Cancer: A Scientific Perspective (1993); Colorectal 
Cancer: Molecular Mechanisms, Premalignant State and its Prevention (Schmiegel & 
Scholmerich eds., 2000); Colorectal Cancer: New Aspects of Molecular Biology and Their 
Clinical Applications (Hanski et al., eds 2000); McArdle et aL, Colorectal Cancer (2000); 
Wanebo, Colorectal Cancer (1993); Levin, The American Cancer Society: Colorectal Cancer 
(1999); Treatment of Hepatic Metastases of Colorectal Cancer (Nordlinger & Jaeck eds., 
1993); Management of Colorectal Cancer (Dunitz et aL, eds. 1998); Cancer: Principles and 
Practice of Oncology (Devita et aL, eds. 2001); Surgical Oncology: Contemporary Principles 
and Practice (Kirby et aL, eds. 2001); Offit, Clinical Cancer Genetics: Risk Counseling and 
Management (1997); Radioimmunotherapy of Cancer (Abrams & Fritzberg eds. 2000); 
Fleming, AJCC Cancer Staging Handbook (1998); Textbook of Radiation Oncology (Leibel 
& Phillips eds. 2000); and Clinical Oncology (Abeloff et aL, eds. 2000). 

[04] Imaging of colorectal cancer for diagnosis has been problematic and 
limited. In addition, metastasis of the tumor to the lumen, and metastasis of tumor cells to 
regional lymph nodes are important prognostic factors {see, e.g., PET in Oncology: Basics 
and Clinical Application (Ruhlmann et aL eds. 1999). For example, five year survival rates 
drop from 80 percent in patients with no lymph node metastases to 45 to 50 percent in those 
patients who do have lymph node metastases. A recent report showed that micrometastases 
can be detected from lymph nodes using reverse transcriptase-PCR methods based on the 
presence of mRNA for carcinoembryonic antigen, which has previously been shown to be 
present in the vast majority of colorectal cancers but not in normal tissues. Liefers et aL,New 
England J. of Med. 339(4):223 (1998). 

[05] Thus, methods that can be used for diagnosis and prognosis of 
colorectal cancer would be desirable. Accordingly, provided herein are methods that can be 
used in diagnosis and prognosis of colorectal cancer. Further provided are methods that can 
be used to screen candidate bioactive agents for the ability to modulate colorectal cancer. 
Additionally, provided herein are molecular targets for therapeutic intervention in colorectal 
and other cancers. 

BRIEF SUMMARY OF THE INVENTION 
[06] The present invention provides novel methods for diagnosis and 
prognosis evaluation for colorectal cancer, as well as methods for screening for compositions 
which modulate colorectal cancer. Methods of treatment of colorectal cancer, as well as 
compositions, are also provided herein. 



[07] In one aspect, a method of screening drug candidates comprises 
providing a cell that expresses an expression profile gene selected from those of Table I. The 
method further includes adding a drug candidate to the cell and determining the effect of the 
drug candidate on the expression of the expression profile gene. 

[08] In one embodiment, the method of screening drug candidates includes 
comparing the level of expression in the absence of the drug candidate to the level of 
expression in the presence of the drug candidate, wherein the concentration of the drug 
candidate can vary when present, and wherein the comparison can occur after addition or 
removal of the drug candidate. In a preferred embodiment, the cell expresses at least two 
expression profile genes. The profile genes may show an increase or decrease. 

[09] Also provided herein is a method of screening for a bioactive agent 
capable of binding to a colorectal cancer modulator protein, the method comprising 
combining the colorectal cancer modulator protein and a candidate bioactive agent, and 
determining the binding of the candidate agent to the colorectal cancer modulator protein. 
Preferably the colorectal cancer modulator protein is a product encoded by a gene of Table 1 
or Table 2. 

[10] Further provided herein is a method for screening for a bioactive agent 
capable of modulating the activity of a colorectal cancer modulator protein. In one 
embodiment, the method comprises combining the colorectal cancer modulator protein and a 
candidate bioactive agent, and determining the effect of the candidate agent on the bioactivity 
of the colorectal cancer modulator protein. Preferably the colorectal cancer modulator 
protein is a product encoded by a gene of Table 1 or Table 2. 

[11] Also provided is a method of evaluating the effect of a candidate 
colorectal cancer drug comprising administering the drug to a transgenic animal expressing or 
over-expressing the colorectal cancer modulator protein, or an animal lacking the colorectal 
cancer modulator protein, for example as a result of a gene knockout. 

[12] Additionally, provided herein is a method of evaluating the effect of a 
candidate colorectal cancer drug comprising administering the drug to a patient and removing 
a cell sample from the patient. The expression profile of the cell is then determined. This 
method may further comprise comparing the expression profile to an expression profile of a 
healthy individual. In a preferred embodiment, said expression profile includes a gene of 
Table 1 or Table 2. 



[13] Moreover, provided herein is a biochip comprising one or more nucleic 
acid segments of Table 1 or Table 2, wherein the biochip comprises fewer than 1000 nucleic 
acid probes. Preferable at least two nucleic acid segments are included. 

[14] Furthermore, a method of diagnosing a disorder associated with 
5 colorectal cancer is provided. The method comprises determining the expression of a gene of 
Table 1 or Table 2, in a first tissue type of a first individual, and comparing the distribution to 
the expression of the gene from a second normal tissue type from the first individual or a 
second unaffected individual. A difference in the expression indicates that the first individual 
has a disorder associated with colorectal cancer. 
10 [15] In another aspect, the present invention provides an antibody which 

specifically binds to a protein encoded by a nucleic acid of Table 1 or Table 2 or a fragment 
thereof. Preferably the antibody is a monoclonal antibody. The antibody can be a fragment 
of an antibody such as a single stranded antibody as further described herein, or can be 
W conjugated to another molecule. In one embodiment, the antibody is a humanized antibody, 
p 15 [16] In one embodiment a method for screening for a bioactive agent 

^ capable of interfering with the binding of a colorectal cancer modulating protein (colorectal 
* cancer modulator protein) or a fragment thereof and an antibody which binds to said 

f% colorectal cancer modulator protein or fragment thereof. In a preferred embodiment, the 
^ method comprises combining a colorectal cancer modulator protein or fragment thereof, a 
;3 20 candidate bioactive agent and an antibody which binds to said colorectal cancer modulator 
protein or fragment thereof. The method further includes determining the binding of said 
colorectal cancer modulator protein or fragment thereof and said antibody. Wherein there is 
a change in binding, an agent is identified as an interfering agent. The interfering agent can 
be an agonist or an antagonist. Preferably, the agent inhibits colorectal cancer. 
25 [17] In a further aspect, a method for inhibiting colorectal cancer is 

provided. The method can be performed in vitro or in vivo, preferably in vivo to an 
individual. In a preferred embodiment the method of inhibiting colorectal cancer is provided 
to an individual with cancer. As described herein, methods of inhibiting colorectal cancer 
can be performed by administering an inhibitor of the activity of a protein encoded by a 
30 nucleic acid of Table 1 or Table 2, including an antisense molecule to the gene or its gene 
product. 

[18] Also provided herein are methods of eliciting an immune response in 
an individual. In one embodiment a method provided herein comprises administering to an 
individual a composition comprising a colorectal cancer modulating protein, or a fragment 
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thereof. In another embodiment, the protein is encoded by a nucleic acid selected from those 
of Table 1 or Table 2. In another aspect, said composition comprises a nucleic acid 
comprising a sequence encoding a colorectal cancer modulating protein, or a fragment 
thereof. 

[19] Further provided herein are compositions capable of eliciting an 
immune response in an individual. In one embodiment, a composition provided herein 
comprises a colorectal cancer modulating protein, preferably encoded by a nucleic acid of 
Table 1 or Table 2, or a fragment thereof, and a pharmaceutical^ acceptable carrier. In 
another embodiment, said composition comprises a nucleic acid comprising a sequence 
encoding a colorectal cancer modulating protein, preferably selected from the nucleic acids of 
Table 1 or Table 2 and a pharmaceutically acceptable carrier. 

[20] Also provided are methods of neutralizing the effect of a colorectal 
cancer protein, or a fragment thereof, comprising contacting an agent specific for said protein 
with said protein in an amount sufficient to effect neutralization. In another embodiment, the 
protein is encoded by a nucleic acid selected from those of Table 1 or Table 2. 

[21] In another aspect of the invention, a method of treating an individual 
for colorectal cancer is provided. In one embodiment, the method comprises administering to 
said individual an inhibitor of a colorectal cancer modulating protein. In another 
embodiment, the method comprises administering to a patient having colorectal cancer an 
antibody to a colorectal cancer modulating protein conjugated to a therapeutic moiety. Such 
a therapeutic moiety can be a cytotoxic agent or a radioisotope. 

[22] Compounds and compositions are also provided. Other aspects of the 
invention will become apparent to the skilled artisan by the following description of the 
invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[NOT APPLICABLE] 

DETAILED DESCRIPTION OF THE INVENTION 
[23] The present invention provides novel methods for diagnosis and 
prognosis evaluation for colorectal cancer, as well as methods for screening for compositions 
which modulate colorectal cancer. The methods herein are related to those of U.S. Patent 
Application Serial No. 09/525,993 and International Patent Application No. 
PCT/US00/07044, each of which is incorporated herein in its entirety. 



10 



[24] By "colorectal cancer" herein is meant a colon and/or rectal tumor or 
cancer that is classified as Dukes stage A or B as well as metastatic tumors classified as 
Dukes stage Cor D (see. e.g., Cohen et al.. Cancer of the Colon, in Cancer: Principles and 
Practice of Oncology, pp. 1 144-1 197 (Devita et al., eds., 5 th ed. 1997); see also Harrison 's 
Principles of Internal Medicine, pp. 1289-129 (Wilson et al., eds., 12 th ed., 1991). 
'Treatment, monitoring, detection or modulation of colorectal cancer" includes treatment, 
monitoring, detection, or modulation of colorectal disease in those patients who have 
colorectal disease (Dukes stage A , B, C or D) in which gene expression from a gene in Table 
1 or 2, is increased or decreased, indicating that the subject is more likely to progress to 
metastatic disease than a patient who does not have an increase or decrease in gene 
expression of a gene in Table 1 or 2. In Dukes stage A, the tumor has penetrated into, but not 
through, the bowel wall. In Dukes stage B, the tumor has penetrated through the bowel wall 
but there is not yet any lymph involvement. In Dukes stage C, the cancer involves regional 
lymph nodes. In Dukes stage D, there is distant metastasis, e.g., liver, lung, etc. 

[25] Table 1 provides unigene cluster identification numbers for the 
| nucleotide se 1 uenc e of g^es that exhibit increased expression in colorectal cancer samples. 
^ Tables 1 also provides an exemplar accession number that provides a nucleotide sequence 
that is part of the unigene cluster. Table 2 provides the nucleic acid and protein sequence of 
the CBF9 gene as well as the Unigene and Exemplar accession numbers for CBF9. 

[26] In one aspect, the expression levels of genes are determined in 
different patient samples for which either diagnosis or prognosis information is desired, to 
provide expression profiles. An expression profile of a particular sample is essentially a 
"fingerprint" of the state of the sample; while two states may have any particular gene 
similarly expressed, the evaluation of a number of genes simultaneously allows the 
generation of a gene expression profile that is unique to the state of the cell. That is, normal 
tissue may be distinguished from colorectal cancer tissue, and within colorectal cancer 
tissue, different prognosis states (good or poor long term survival prospects, for example) 
may be determined. By comparing expression profiles of colon tissue in known different 
states, information regarding which genes are important (including both up- and down- 
regulation of genes) in each of these states is obtained. The identification of sequences that 
are differentially expressed in colorectal cancer versus normal colon tissue, as well as 
differential expression resulting in different prognostic outcomes, allows the use of this 
information in a number of ways. For example, the evaluation of a particular treatment 
regime may be evaluated: does a chemotherapeutic drug act to improve the long-term 
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Ptosis in a particular patient. Similarly, diagnosis may be done or confined by 
companng patient samples with the Known expression profi.es. Furthermore, these gene 

zzr p t (or individuai genes) a,,ow ° f ^ «<* - - 

mtm^ng or altenng a particu,ar expression profiie; for examp,,, screening can be done for 
drugs tha, suppress the colorectal cancer expression profile or convert a po r prognos 

■he m,por,a„, colorecta , ca „ cer genM whjch _ ^ ^ ^ J P S sets 

ir:r nalso * do " e ^ 

scl 7T Pr °' einS ^ " e eVa ' UaKd - Ptoses or ,o 

s reen candtdate agents. ,„ addition, ,he co.orecta, cancer nuc.eic acid sequences can I 
adrntmstered forgene ,herap y purposes, inciuding me administtation of an isense n ul* 
ac,ds, or the co.orecta, cancer proteins (inc.uding anybodies and other modu.al l ^ 
administered as therapeutic drugs. mereot; 

[27] Thus the present invention provides nucleic acid and protein 
sequences that are differentially expressed in co.orectal cancer, herein 
cancer sequences, As outlined below, colorectal cancer sequences include those Z l 
.p-regulated (i.e. expressed a, a higher leve,) in colorectal cancer , as well as thos! 2 Z 
down.regula.ed (i.e. expressed a, a lower .eve!) in colorecta, cancer . In a prefeld 
e m bodime„, the colotecta, cancer sequences are from humans; however, ^ 
apprectated by those in the art, coioreca, cancer sequences from other organisms may be 
useful ,„ anima, mode, of disease and drug evaiuation; thus, other co.orecta, canceT 
sequences are provided, from vertebrates, inCuding mamma,s, including rodents (ra ,s mice 
amsters, gutnea pigs, etc.,, primates, farm animais (including sheep, goats pigs cowl 

* rsrnr- 

[28] Colorectal cancer sequences can include both nucleic acid and , m 
-sequences^napreferredembodiment, thecoma, cancer sequen I^ml, 

2Z Km " reC ° mbinam nUCkiC add " herei " iS ™"" -id, 0*1 

Wd ,n v,tro, ,„ genera,, by the manipu.ation of nuc.eic acid by pCymerases and 

2 2 n ° ma " y fOU " d ^ natUre ' ThUS " -id in a 

near ™ expression vector ^ fa vj(ro ^ ^ - a 

I 7 r: ' ~ ^ C ° nSidered TO ° mbi " a - ** - ~ of this invention ft i 
nderstood that once a recombinant nuCeic acid is made and reintroduced into a host e or 
organs, „ wi„ replicate non-recombinant.y, i.e. using the in vivo cellular machCm 
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host cell rather than in vitro manipulations; however, such nucleic acids, once produced 
recombinantly, although subsequently replicated non-recombinantly, are still considered 
recombinant for the purposes of the invention. 

[29] Similarly, a "recombinant protein" is a protein made using recombinant 
techniques, i.e. through the expression of a recombinant nucleic acid as depicted above. A 
recombinant protein is distinguished from naturally occurring protein by at least one or more 
characteristics. For example, the protein may be isolated or purified away from some or all 
of the proteins and compounds with which it is normally associated in its wild type host, and 
thus may be substantially pure. For example, an isolated protein is unaccompanied by at least 
some of the material with which it is normally associated in its natural state, preferably 
constituting at least about 0.5%, more preferably at least about 5% by weight of the total 
protein in a given sample. A substantially pure protein comprises at least about 75% by 
weight of the total protein, with at least about 80% being preferred, and at least about 90% 
being particularly preferred. The definition includes the production of a colorectal cancer 
protein from one organism in a different organism or host cell. Alternatively, the protein may 
be made at a significantly higher concentration than is normally seen, through the use of an 
inducible promoter or high expression promoter, such that the protein is made at increased 
concentration levels. Alternatively, the protein may be in a form not normally found in 
nature, as in the addition of an epitope tag or amino acid substitutions, insertions and 
deletions, as discussed below. 

[30] In a preferred embodiment, the colorectal cancer sequences are 
nucleic acids. As will be appreciated by those in the art and is more fully outlined below, 
colorectal cancer sequences are useful in a variety of applications, including diagnostic 
applications, which will detect naturally occurring nucleic acids, as well as screening 
applications; for example, biochips comprising nucleic acid probes to the colorectal cancer 
sequences can be generated. In the broadest sense, then, by "nucleic acid" or 
"oligonucleotide" or grammatical equivalents herein means at least two nucleotides 
covalently linked together. A nucleic acid of the present invention will generally contain 
phosphodiester bonds, although in some cases, as outlined below, nucleic acid analogs are 
included that may have alternate backbones, comprising, for example, phosphoramidate 
(Beaucage et al., Tetrahedron 49(10): 1925 (1993) and references therein; Letsinger, J. Org. 
Chem. 35:3800 (1970); Sprinzl et al., Eur. J. Biochem. 81:579 (1977); Letsinger et al., Nucl. 
Acids Res. 14:3487 (1986); Sawai et al, Chem. Lett. 805 (1984), Letsinger et al., J. Am. 
Chem. Soc. 110:4470 (1988); andPauwels et al., Chemica Scripta 26:141 91986)), 
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phosphorothioate (Mag et ah, Nucleic Acids Res. 19:1437 (1991); and U.S. Patent No. 
5,644,048), phosphorodithioate (Briu et al., J. Am. Chem. Soc. 111:2321 (1989), O- 
methylphophoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A Practical 
Approach, Oxford University Press), and peptide nucleic acid backbones and linkages (see 
Egholm, J. Am. Chem. Soc. 1 14:1895 (1992); Meier et al., Chem. Int. Ed. Engl. 31:1008 
(1992); Nielsen, Nature, 365:566 (1993); Carlsson et al., Nature 380:207 (1996), all of which 
are incorporated by reference). Other analog nucleic acids include those with positive 
backbones (Denpcy et al., Proc. Natl. Acad. Sci. USA 92:6097 (1995); non-ionic backbones 
(U.S. Patent Nos. 5,386,023, 5,637,684, 5,602,240, 5,216,141 and 4,469,863; Kiedrowshi et 
al., Angew. Chem. Intl. Ed. English 30:423 (1991); Letsinger et al., J. Am. Chem. Soc. 
110:4470 (1988); Letsinger et al., Nucleoside & Nucleotide 13:1597 (1994); Chapters 2 and 
3, ASC Symposium Series 580, "Carbohydrate Modifications in Antisense Research", Ed. 
Y.S. Sanghui and P. Dan Cook; Mesmaeker et al., Bioorganic & Medicinal Chem. Lett. 4:395 
(1994); Jeffs et al., J. Biomolecular NMR 34:17 (1994); Tetrahedron Lett. 37:743 (1996)) and 
jp non-ribose backbones, including those described in U.S. Patent Nos. 5,235,033 and 
3 5 ' 034 ' 506 ' and Cha Pters 6 and 7, ASC Symposium Series 580, "Carbohydrate Modifications 
a in Antisense Research", Ed. Y.S. Sanghui and P. Dan Cook. Nucleic acids containing one or 
more carbocyclic sugars are also included within one definition of nucleic acids (see Jenkins 
et al., Chem. Soc. Rev. (1995) pp!69-176). Several nucleic acid analogs are described in 
Rawls, C & E News June 2, 1997 page 35. All of these references are hereby expressly 
incorporated by reference. These modifications of the ribose-phosphate backbone may be 
done for a variety of reasons, for example to increase the stability and half-life of such 
molecules in physiological environments or as probes on a biochip. 

[31] As will be appreciated by those in the art, all of these nucleic acid 
analogs may find use in the present invention. In addition, mixtures of naturally occurring 
nucleic acids and analogs can be made; alternatively, mixtures of different nucleic acid 
analogs, and mixtures of naturally occurring nucleic acids and analogs may be made. 

[32] Particularly preferred are peptide nucleic acids (PNA) which includes 
peptide nucleic acid analogs. These backbones are substantially non-ionic under neutral 
conditions, in contrast to the highly charged phosphodiester backbone of naturally occurring 
nucleic acids. This results in two advantages. First, the PNA backbone exhibits improved 
hybridization kinetics. PNAs have larger changes in the melting temperature (Tm) for 
mismatched versus perfectly matched basepairs. DNA and RNA typically exhibit a 2-4°C 
drop in Tm for an internal mismatch. With the non-ionic PNA backbone, the drop is closer to 
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7-9°C. Similarly, due to their non-ionic nature, hybridization of the bases attached to these 
backbones is relatively insensitive to salt concentration. In addition, PNAs are not degraded 
by cellular enzymes, and thus can be more stable. 

[33] The nucleic acids may be single stranded or double stranded as 
specified, or contain portions of both double stranded or single stranded sequence. As will be 
appreciated by those in the art, the depiction of a single strand ("Watson") also defines the 
sequence of the other strand ("Crick"); thus the sequences described herein also includes the 
complement of the sequence. The nucleic acid may be DNA, both genomic and cDNA RNA 
or a hybrid, where the nucleic acid contains any combination of deoxyribo- and ribo- ' 
nucleotides, and any combination of bases, including uracil, adenine, thymine, cytosine 
guanine, inosine, xanthine hypoxanthine, isocytosine, isoguanine, etc. As used herein the 
term "nucleoside" includes nucleotides and nucleoside and nucleotide analogs, and modified 
nucleosides such as amino modified nucleosides. In addition, "nucleoside" includes non- 
naturally occurring analog structures. Thus for example the individual units of a peptide 
nucleic acid, each containing a base, are referred to herein as a nucleoside. 

[34] A colorectal cancer sequence can be initially identified by substantial 
? nucleic acid and/or amino acid sequence homology to the colorectal cancer sequences 
J outlined herein. Such homology can be based upon the overall nucleic acid or amino acid 
: sequence, and is generally determined as outlined below, using either homology programs or 
10 hybridization conditions. 

[35] The isolation of mRNA comprises isolating total cellular RNA by 
disrupting a cell and performing differential centrifugation. Once the total RNA is isolated 
mRNA is isolated by making use of the adenine nucleotide residues known to those skilled in 
the art as a poly (A) tail found on virtually every eukaryotic mRNA molecule at the 3end 

5 thereof. Oligonucleotides composed of only deoxythymidine [olgo(dT)] are linked to 

cellulose and the oligo(dT)-ce.lu,ose packed into small columns. When a preparation of total 
cellular RNA is passed through such a column, the mRNA molecules bind to the oligo(dT) by 
the poly (A) tails while the rest of the RNA flows through the column. The bound mRNAs 
are then eluted from the column and collected. 

> [36] The colorectal cancer sequences of the invention can be identified as 

follows. Samp,e S of normal and tumor tissue are applied to biochips comprising nucleic acid 
probes. The samples are first microdissected, if applicable, and treated as described above 
for the preparation of mRNA. Suitable biochips are commercially available, for example 
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from Affymetrix. Gene expression profiles as described herein are generated, and the data 
analyzed. 

[37] In a preferred embodiment, the genes showing changes in expression 
as between normal and disease states are compared to genes expressed in other normal 
tissues, including, but not limited to lung, heart, brain, liver, breast, kidney, muscle, prostate, 
small intestine, large intestine, spleen, bone, and placenta. In a preferred embodiment, those 
genes identified during the colorectal cancer screen that are expressed in any significant 
amount in other tissues are removed from the profile, although in some embodiments, this is 
not necessary. That is, when screening for drugs, it is preferable that the target be disease 
specific, to minimize possible side effects. 

[38] In a preferred embodiment, colorectal cancer sequences are those that 
are up-regulated in colorectal cancer ; that is, the expression of these genes is higher in 
colorectal carcinoma as compared to normal colon tissue. "Up-regulation" as used herein 
means at least about a 1.1 fold change, preferably a 1.5 or two fold change, preferably at least 
about a three fold change, with at least about five-fold or higher being preferred. All 
accession numbers herein are for the GenBank sequence database and the sequences of the 
accession numbers are hereby expressly incorporated by reference. GenBank is known in the 
art, see, e.g., Benson, DA, et al., Nucleic Acids Research 26:1-7 (1998) and 
http://www.ncbi.nlm.nih.gov/. In addition, these genes were found to be expressed in a 
limited amount or not at all in heart, brain, lung, liver, breast, kidney, prostate, small intestine 
and spleen. 

[39] In a preferred embodiment, colorectal cancer sequences are those that 
are down-regulated in colorectal cancer ; that is, the expression of these genes is lower in 
colorectal carcinoma as compared to normal colon tissue. "Down-regulation" as used herein 
means at least about a two-fold change, preferably at least about a three fold change, with at 
least about five-fold or higher being preferred. 

[40] Colorectal cancer proteins of the present invention may be classified 
as secreted proteins, transmembrane proteins or intracellular proteins. In a preferred 
embodiment the colorectal cancer protein is an intracellular protein. Intracellular proteins 
may be found in the cytoplasm and/or in the nucleus. Intracellular proteins are involved in all 
aspects of cellular function and replication (including, for example, signaling pathways); 
aberrant expression of such proteins results in unregulated or disregulated cellular processes. 
For example, many intracellular proteins have enzymatic activity such as protein kinase 
activity, protein phosphatase activity, protease activity, nucleotide cyclase activity, 
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polymerase activity and the like. Intracellular proteins also serve as docking proteins that are 
involved in organizing complexes of proteins, or targeting proteins to various subcellular 
localizations, and are involved in maintaining the structural integrity of organelles. 

[41] An increasingly appreciated concept in characterizing intracellular 
proteins is the presence in the proteins of one or more motifs for which defined functions 
have been attributed. In addition to the highly conserved sequences found in the enzymatic 
domain of proteins, highly conserved sequences have been identified in proteins that are 
involved in protein-protein interaction. For example, Src-homology-2 (SH2) domains bind 
tyrosine-phosphorylated targets in a sequence dependent manner. PTB domains, which are 
distinct from SH2 domains, also bind tyrosine phosphorylated targets. SH3 domains bind to 
proline-rich targets. In addition, PH domains, tetratricopeptide repeats and WD domains to 
name only a few, have been shown to mediate protein-protein interactions. Some of these 
may also be involved in binding to phospholipids or other second messengers. As will be 
appreciated by one of ordinary skill in the art, these motifs can be identified on the basis of 
primary sequence; thus, an analysis of the sequence of proteins may provide insight into both 
the enzymatic potential of the molecule and/or molecules with which the protein may 



associate. 



[42] In a preferred embodiment, the colorectal cancer sequences are 
transmembrane proteins. Transmembrane proteins are molecules that span the phospholipid 
) b.layer of a cell. They may have an intracellular domain, an extracellular domain, or both 
The intracellular domains of such proteins may have a number of functions including those 
already described for intracellular proteins. For example, the intracellular domain may have 
enzymatic activity and/or may serve as a binding site for additional proteins. Frequently the 
intracellular domain of transmembrane proteins serves both roles. For example certain 
receptor tyrosine kinases have both protein kinase activity and SH2 domains. In addition 
autophosphorylation of tyrosines on the receptor molecule itself, creates binding sites for 
additional SH2 domain containing proteins. 

[43] Transmembrane proteins may contain from one to many 
transmembrane domains. For example, receptor tyrosine kinases, certain cytokine receptors 
receptor guanylyl cyclases and receptor serine/threonine protein kinases contain a single 
transmembrane domain. However, various other proteins including channels and adenylyl 
cyclases contain numerous transmembrane domains. Many important cell surface receptors 
are classified as "seven transmembrane domain" proteins, as they contain 7 membrane 
spanning regions. Important transmembrane protein receptors include, but are not limited to 
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insulin receptor, insulin-like growth factor receptor, human growth hormone receptor, 
glucose transporters, transferrin receptor, epidermal growth factor receptor, low density 
lipoprotein receptor, epidermal growth factor receptor, leptin receptor, interleukin receptors, 
e.g. IL-1 receptor, IL-2 receptor, etc. 

[44] Characteristics of transmembrane domains include approximately 20 
consecutive hydrophobic amino acids that may be followed by charged amino acids. 
Therefore, upon analysis of the amino acid sequence of a particular protein, the localization 
and number of transmembrane domains within the protein may be predicted. 

[45] The extracellular domains of transmembrane proteins are diverse; 
however, conserved motifs are found repeatedly among various extracellular domains. 
Conserved structure and/or functions have been ascribed to different extracellular motifs. For 
example, cytokine receptors are characterized by a cluster of cysteines and a WSXWS (W= 
tryptophan, S= serine, X=any amino acid) motif. Immunoglobulin-like domains are highly 
conserved. Mucin-like domains may be involved in cell adhesion and leucine-rich repeats 
participate in protein-protein interactions. 

[46] Many extracellular domains are involved in binding to other 
molecules. In one aspect, extracellular domains are receptors. Factors that bind the receptor 
domain include circulating ligands, which may be peptides, proteins, or small molecules such 
as adenosine and the like. For example, growth factors such as EGF, FGF and PDGF are 
circulating growth factors that bind to their cognate receptors to initiate a variety of cellular 
responses. Other factors include cytokines, mitogenic factors, neurotrophic factors and the 
like. Extracellular domains also bind to cell-associated molecules. In this respect, they 
mediate cell-cell interactions. Cell-associated ligands can be tethered to the cell for example 
via a glycosylphosphatidylinositol (GPI) anchor, or may themselves be transmembrane 
proteins. Extracellular domains also associate with the extracellular matrix and contribute to 
the maintenance of the cell structure. 

[47] Colorectal cancer proteins that are transmembrane are particularly 
preferred in the present invention as they are good targets for immunotherapeutics, as are 
described herein. In addition, as outlined below, transmembrane proteins can be also useful 
in imaging modalities. 

[48] It will also be appreciated by those in the art that a transmembrane 
protein can be made soluble by removing transmembrane sequences, for example through 
recombinant methods. Furthermore, transmembrane proteins that have been made soluble 
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can be made to be secreted through recombinant means by adding an appropriate signal 
sequence. 

[49] In a preferred embodiment, the colorectal cancer proteins are secreted 
proteins; the secretion of which can be either constitutive or regulated. These proteins have a 
5 signal peptide or signal sequence that targets the molecule to the secretory pathway. Secreted 
proteins are involved in numerous physiological events; by virtue of their circulating nature, 
they serve to transmit signals to various other cell types. The secreted protein may function in 
an autocrine manner (acting on the cell that secreted the factor), a paracrine manner (acting 
on cells in close proximity to the cell that secreted the factor) or an endocrine manner (acting 
10 on cells at a distance). Thus secreted molecules find use in modulating or altering numerous 
aspects of physiology, colorectal cancer proteins that are secreted proteins are particularly 
preferred in the present invention as they serve as good targets for diagnostic markers, for 
example for blood tests. 

If [50] A colorectal cancer sequence is initially identified by substantial 

: Cl5 nucleic acid and/or amino acid sequence homology to the colorectal cancer sequences 
fit outlined herein. Such homology can be based upon the overall nucleic acid or amino acid 
*^ sequence, and is generally determined as outlined below, using either homology programs or 

Q hybridization conditions. 

ft\ 

J : [51] As used herein, the terms "colorectal cancer nucleic acid", "colorectal 

*p20 cancer protein" or "colorectal cancer polynucleotide" or "colorectal cancer-associated 
m transcript" refers to nucleic acid and polypeptide polymorphic variants, alleles, mutants, and 
interspecies homologs that: (1) have a nucleotide sequence that has greater than about 60% 
nucleotide sequence identity, 65%, 70%, 75%, 80%, 85%, 90%, preferably 91%, 92%, 93%, 
94%, 95%, 96%, 97%, 98% or 99% or greater nucleotide sequence identity, preferably over a 
25 region of over a region of at least about 25, 50, 100, 200, 500, 1000, or more nucleotides, to a 
nucleotide sequence of or associated with a unigene cluster of Tables 1 or Table 2; (2) bind to 
antibodies, e.g., polyclonal antibodies, raised against an immunogen comprising an amino 
acid sequence encoded by a nucleotide sequence of or associated with a unigene cluster of 
Table 1 or Table 2, and conservatively modified variants thereof; (3) specifically hybridize 
30 under stringent hybridization conditions to a nucleic acid sequence, or the complement 
thereof of Table 1 or Table 2 and conservatively modified variants thereof or (4) have an 
amino acid sequence that has greater than about 60% amino acid sequence identity, 65%, 
70%, 75%, 80%, 85%, 90%, preferably 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% 
or greater amino sequence identity, preferably over a region of over a region of at least about 
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25, 50, 100, 200, 500, 1000, or more amino acid, to an amino acid sequence encoded by a 
nucleotide sequence of or associated with a unigene cluster of Table 1 or Table 2. A 
polynucleotide or polypeptide sequence is typically from a mammal including, but not 
limited to, primate, e.g., human; rodent, e.g., rat, mouse, hamster; cow, pig, horse, sheep, or 
other mammal. A "colorectal cancer polypeptide" and a "colorectal cancer polynucleotide," 
include both naturally occurring or recombinant. 

[52] Homology in this context means sequence similarity or identity, with 
identity being preferred. A preferred comparison for homology purposes is to compare the 
sequence containing sequencing errors to the correct sequence. This homology will be 
determined using standard techniques known in the art, including, but not limited to, the local 
homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), by the 
homology alignment algorithm of Needleman & Wunsch, J. Mol. Biool. 48:443 (1970), by 
the search for similarity method of Pearson & Lipman, PNAS USA 85:2444 (1988), by 
computerized implementations of these algorithms (GAP, BESTFTT, FASTA, and TFASTA 
in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Drive, 
Madison, WI), the Best Fit sequence program described by Devereux et al., Nucl. Acid Res. 
12:387-395 (1984), preferably using the default settings, or by inspection. 

[53] In a preferred embodiment, the sequences which are used to determine 
sequence identity or similarity are selected from the sequences set forth in Table 1 or Table 2. 
In one embodiment the sequences utilized herein are those set forth in Table 1 or Table 2. In 
another embodiment, the sequences are naturally occurring allelic variants of the sequences 
set forth in Table 1 or Table 2. In another embodiment, the sequences are sequence variants 
as further described herein. 

[54] The terms "identical" or percent "identity," in the context of two or 
more nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences 
that are the same or have a specified percentage of amino acid residues or nucleotides that are 
the same (i.e., about 60% identity, preferably 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 
94%, 95%, 96%, 97%, 98%, 99%, or higher identity over a specified region, when compared 
and aligned for maximum correspondence over a comparison window or designated region) 
as measured using a BLAST or BLAST 2.0 sequence comparison algorithms with default 
parameters described below, or by manual alignment and visual inspection (see, e.g., NCBI 
web site http://www.ncbi.nlm.nih.gov/BLAST/ or the like). Such sequences are then said to 
be "substantially identical." This definition also refers to, or may be applied to, the 
compliment of a test sequence. The definition also includes sequences that have deletions 
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and/or additions, as well as those that have substitutions, as well as naturally occurring, e.g., 
polymorphic or allelic variants, and man-made variants. As described below, the preferred 
algorithms can account for gaps and the like. Preferably, identity exists over a region that is 
at least about 25 amino acids or nucleotides in length, or more preferably over a region that is 
50-100 amino acids or nucleotides in length. 

[55] For sequence comparison, typically one sequence acts as a reference 
sequence, to which test sequences are compared. When using a sequence comparison 
algorithm, test and reference sequences are entered into a computer, subsequence coordinates 
are designated, if necessary, and sequence algorithm program parameters are designated. 
Preferably, default program parameters can be used, or alternative parameters can be 
designated. The sequence comparison algorithm then calculates the percent sequence 
identities for the test sequences relative to the reference sequence, based on the program 
parameters. 

[56] A "comparison window", as used herein, includes reference to a 
segment of one of the number of contiguous positions selected from the group consisting 
typically of from 20 to 600, usually about 50 to about 200, more usually about 100 to about 
150 in which a sequence may be compared to a reference sequence of the same number of 
contiguous positions after the two sequences are optimally aligned. Methods of alignment of 
sequences for comparison are well-known in the art. Optimal alignment of sequences for 
comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, 
Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman & 
Wunsch, J. Mol Biol. 48:443 (1970), by the search for similarity method of Pearson & 
Lipman, Proc. Natl. Acad. Set. USA 85:2444 (1988), by computerized implementations of 
these algorithms (GAP, BESTFTT, FASTA, and TFASTA in the Wisconsin Genetics 
Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or by manual 
alignment and visual inspection (see, e.g., Current Protocols in Molecular Biology (Ausubel 
etal, eds. 1995 supplement)). 

[57] Preferred examples of algorithms that are suitable for determining 
percent sequence identity and sequence similarity include the BLAST and BLAST 2.0 
algorithms, which are described in Altschul et al, Nuc. Acids Res. 25:3389-3402 (1977) and 
Altschul et al, J. Mol Biol 215:403-410 (1990). BLAST and BLAST 2.0 are used, with the 
parameters described herein, to determine percent sequence identity for the nucleic acids and 
proteins of the invention. Software for performing BLAST analyses is publicly available 
through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/). 
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This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying 
short words of length W in the query sequence, which either match or satisfy some positive- 
valued threshold score T when aligned with a word of the same length in a database 
sequence. T is referred to as the neighborhood word score threshold (Altschul et al, supra). 
5 These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs 
containing them. The word hits are extended in both directions along each sequence for as 
far as the cumulative alignment score can be increased. Cumulative scores are calculated 
using, e.g., for nucleotide sequences, the parameters M (reward score for a pair of matching 
residues; always > 0) and N (penalty score for mismatching residues; always < 0). For amino 
10 acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the 
word hits in each direction are halted when: the cumulative alignment score falls off by the 
quantity X from its maximum achieved value; the cumulative score goes to zero or below, 
*q due to the accumulation of one or more negative-scoring residue alignments; or the end of 
?^ either sequence is reached. The BLAST algorithm parameters W, T, and X determine the 
0.5 sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) 
JH uses as defaults a wordlength (W) of 11, an expectation (E) of 10, M=5, N=-4 and a 
£3 comparison of both strands. For amino acid sequences, the BLASTP program uses as 
Q defaults a wordlength of 3, and expectation (E) of 10, and the BLOSUM62 scoring matrix 
ft (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1989)) alignments (B) of 
-p20 50, expectation (E) of 10, M=5, N=-4, and a comparison of both strands, 
jfi [58] The BLAST algorithm also performs a statistical analysis of the 

similarity between two sequences (see, e.g., Karlin & Altschul, Proc. Nat'L Acad. ScL USA 
90:5873-5787 (1993)). One measure of similarity provided by the BLAST algorithm is the 
smallest sum probability (P(N)), which provides an indication of the probability by which a 
25 match between two nucleotide or amino acid sequences would occur by chance. For 

example, a nucleic acid is considered similar to a reference sequence if the smallest sum 
probability in a comparison of the test nucleic acid to the reference nucleic acid is less than 
about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001. 
Log values may be large negative numbers, e.g., 5, 10, 20, 30, 40, 40, 70, 90, 110, 150, 170, 
30 etc. 

[59] In one embodiment, the nucleic acid homology is determined through 
hybridization studies. Thus, for example, nucleic acids which hybridize under high 
stringency to the nucleic acid sequences which encode the peptides identified in Table 1 or 
Table 2, or their complements, are considered a colorectal cancer sequence. High stringency 
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conditions are known in the art; see for example Maniatis et al.. Molecular Cloning: A 
Laboratory Manual, 2d Edition, 1989, and Short Protocols in Molecular Biology ed 
Ausubel, et al., both of which are hereby incorporated by reference. Stringent conditions are 
sequence-dependent and will be different in different circumstances. Longer sequences 
hybridize specifically at higher temperatures. An extensive guide to the hybridization of 
nucleic acids is found in Tijssen, Techniques in Biochemistry and Molecular Biology- 
Hybridization with Nucleic Acid Probes, "Overview of principles of hybridization and the 
strategy of nucleic acid assays" (1993). Generally, stringent conditions are selected to be 
about 5-10°C lower than the thermal melting point (Tm) for the specific sequence at a 
defined ionic strength pH. The Tm is the temperature (under defined ionic strength, pH and 
nuclei acid concentration) at which 50% of the probes complementary to the target hybridize 
to the target sequence at equilibrium (as the target sequences are present in excess, at Tm 
50% of the probes are occupied at equilibrium). Stringent conditions will be those in which 
the salt concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M 
sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 
30°C for short probes (e.g. 10 to 50 nucleotides) and at least about 60°C for long probes (e g 
greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of 
destabilizing agents such as formamide. 

[60] In another embodiment, less stringent hybridization conditions are 
used; for example, moderate or low stringency conditions may be used, as are known in the 
* art; see Maniatis and Ausubel, supra, and Tijssen, supra. For selective or specific 

hybridization, a positive signal is at least two times background, preferably 10 times 
background hybridization. Exemplary stringent hybridization conditions can be as following- 
50% formamide, 5x SSC, and 1% SDS, incubating at 42°C, or, 5x SSC, 1% SDS, incubating 
25 at 65°C, with wash in 0.2x SSC, and 0. 1 % SDS at 65°C. 

[61] Nucleic acids that do not hybridize to each other under stringent 
conditions are still substantially identical if the polypeptides which they encode are 
substantially identical. This occurs, for example, when a copy of a nucleic acid is created 
using the maximum codon degeneracy permitted by the genetic code. In such cases the 
30 nucleic acids typically hybridize under moderately stringent hybridization conditions 
Exemplary "moderately stringent hybridization conditions" include a hybridization in a 
buffer of 40% formamide, 1 M NaCl, 1% SDS at 37°C, and a wash in IX SSC at 45°C A 
positive hybridization is at least twice background. Those of ordinary skill will readily 
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recognize that alternative hybridization and wash conditions can be utilized to provide 
conditions of similar stringency. Additional guidelines for determining hybridization 
parameters are provided in numerous reference, e.g., and Current Protocols in Molecular 
Biology ; ed. Ausubel, et al. 
5 [62] For PCR, a temperature of about 36°C is typical for low stringency 

amplification, although annealing temperatures may vary between about 32°C and 48°C 
depending on primer length. For high stringency PCR amplification, a temperature of about 
62°C is typical, although high stringency annealing temperatures can range from about 50°C 
to about 65°C, depending on the primer length and specificity. Typical cycle conditions for 
10 both high and low stringency amplifications include a denaturation phase of 90°C - 95°C for 
30 sec - 2 min., an annealing phase lasting 30 sec. - 2 min., and an extension phase of about 
*S| 72°C for 1 - 2 min. Protocols and guidelines for low and high stringency amplification 
^0 reactions are provided, e.g., in Innis et al, PCR Protocols, A Guide to Methods and 

w 

q Applications (1990). 

lt\5 [63] In addition, the colorectal cancer nucleic acid sequences of the 

C3 invention are fragments of larger genes, i.e. they are nucleic acid segments. "Genes" in this 
#3 context includes coding regions, non-coding regions, and mixtures of coding and non-coding 
regions. Accordingly, as will be appreciated by those in the art, using the sequences provided 
herein, additional sequences of the colorectal cancer genes can be obtained, using techniques 
£ Jo well known in the art for cloning either longer sequences or the full length sequences; see 
Maniatis et al., and Ausubel, et al., supra, hereby expressly incorporated by reference. 

[64] An indication that two nucleic acid sequences or polypeptides are 
substantially identical is that the polypeptide encoded by the first nucleic acid is 
immunologically cross reactive with the antibodies raised against the polypeptide encoded by 
25 the second nucleic acid. Thus, a polypeptide is typically substantially identical to a second 
polypeptide, e.g., where the two peptides differ only by conservative substitutions. Another 
indication that two nucleic acid sequences are substantially identical is that the two molecules 
or their complements hybridize to each other under stringent conditions, as described above. 
Yet another indication that two nucleic acid sequences are substantially identical is that the 
30 same primers can be used to amplify the sequences. 

[65] Once the colorectal cancer nucleic acid is identified, it can be cloned 
and, if necessary, its constituent parts recombined to form the entire colorectal cancer nucleic 
acid. Once isolated from its natural source, e.g., contained within a plasmid or other vector 
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or excised therefrom as a linear nucleic acid segment, the recombinant colorectal cancer 
nucleic acid can be further-used as a probe to identify and isolate other colorectal cancer 
nucleic acids, for example additional coding regions. It can also be used as a "precursor" 
nucleic acid to make modified or variant colorectal cancer nucleic acids and proteins. 

[66] The colorectal cancer nucleic acids of the present invention are used in 
several ways. In a first embodiment, nucleic acid probes to the colorectal cancer nucleic 
acids are made and attached to biochips to be used in screening and diagnostic methods, as 
outlined below, or for administration, for example for gene therapy and/or antisense 
applications. Alternatively, the colorectal cancer nucleic acids that include coding regions of 
colorectal cancer proteins can be put into expression vectors for the expression of colorectal 
cancer proteins, again either for screening purposes or for administration to a patient. 

[67] In a preferred embodiment, nucleic acid probes to colorectal cancer 
nucleic acids (both the nucleic acid sequences encoding peptides outlined in the Table 1 or 
Table 2 and/or the complements thereof) are made. The nucleic acid probes attached to the 
biochip are designed to be substantially complementary to the colorectal cancer nucleic 
acids, i.e. the target sequence (either the target sequence of the sample or to other probe 
sequences, for example in sandwich assays), such that hybridization of the target sequence 
and the probes of the present invention occurs. As outlined below, this complementarity need 
not be perfect; there may be any number of base pair mismatches which will interfere with 
hybridization between the target sequence and the single stranded nucleic acids of the present 
invention. However, if the number of mutations is so great that no hybridization can occur 
under even the least stringent of hybridization conditions, the sequence is not a 
complementary target sequence. Thus, by "substantially complementary" herein is meant 
that the probes are sufficiently complementary to the target sequences to hybridize under 
normal reaction conditions, particularly high stringency conditions, as outlined herein. 

[68] A nucleic acid probe is generally single stranded but can be partially 
single and partially double stranded. The strandedness of the probe is dictated by the 
structure, composition, and properties of the target sequence. In general, the nucleic acid 
probes range from about 8 to about 100 bases long, with from about 10 to about 80 bases 
being preferred, and from about 30 to about 50 bases being particularly preferred. That is, 
generally whole genes are not used. In some embodiments, much longer nucleic acids can be 
used, up to hundreds of bases. 

[69] In a preferred embodiment, more than one probe per sequence is used, 
with either overlapping probes or probes to different sections of the target being used. That 
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is, two, three, four or more probes, with three being preferred, are used to build in a 
redundancy for a particular target. The probes can be overlapping (i.e. have some sequence 
in common), or separate. 

[70] As will be appreciated by those in the art, nucleic acids can be 
attached or immobilized to a solid support in a wide variety of ways. By "immobilized" and 
grammatical equivalents herein is meant the association or binding between the nucleic acid 
probe and the solid support is sufficient to be stable under the conditions of binding, washing, 
analysis, and removal as outlined below. The binding can be covalent or non-covalent. By 
"non-covalent binding" and grammatical equivalents herein is meant one or more of either 
electrostatic, hydrophilic, and hydrophobic interactions. Included in non-covalent binding is 
the covalent attachment of a molecule, such as, streptavidin to the support and the non- 
covalent binding of the biotinylated probe to the streptavidin. By "covalent binding" and 
grammatical equivalents herein is meant that the two moieties, the solid support and the 
probe, are attached by at least one bond, including sigma bonds, pi bonds and coordination 
bonds. Covalent bonds can be formed directly between the probe and the solid support or can 
be formed by a cross linker or by inclusion of a specific reactive group on either the solid 
support or the probe or both molecules. Immobilization may also involve a combination of 
covalent and non-covalent interactions. 

[71] In general, the probes are attached to the biochip in a wide variety of 
ways, as will be appreciated by those in the art. As described herein, the nucleic acids can 
either be synthesized first, with subsequent attachment to the biochip, or can be directly 
synthesized on the biochip. 

[72] The biochip comprises a suitable solid substrate. By "substrate" or 
"solid support" or other grammatical equivalents herein is meant any material that can be 
modified to contain discrete individual sites appropriate for the attachment or association of 
the nucleic acid probes and is amenable to at least one detection method. As will be 
appreciated by those in the art, the number of possible substrates are very large, and include, 
but are not limited to, glass and modified or functionalized glass, plastics (including acrylics' 
polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, 
polybutylene, polyurethanes, TeflonJ, etc.), polysaccharides, nylon or nitrocellulose, resins, 
silica or silica-based materials including silicon and modified silicon, carbon, metals, 
inorganic glasses, plastics, etc. In general, the substrates allow optical detection and do not 
appreciably fluoresce. A preferred substrate is described in copending application entitled 
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Reusable Low Fluorescent Plastic Biochip, U.S. Application Serial No. 09/270,214, filed 
March 15, 1999, herein incorporated by reference in its entirety. 

[73] Generally the substrate is planar, although as will be appreciated by 
those in the art, other configurations of substrates may be used as well. For example, the 
5 probes may be placed on the inside surface of a tube, for flow-through sample analysis to 
minimize sample volume. Similarly, the substrate may be flexible, such as a flexible foam, 
including closed cell foams made of particular plastics. 

[74] In a preferred embodiment, the surface of the biochip and the probe 
may be derivatized with chemical functional groups for subsequent attachment of the two. 
10 Thus, for example, the biochip is derivatized with a chemical functional group including, but 
not limited to, amino groups, carboxy groups, oxo groups and thiol groups, with amino 
groups being particularly preferred. Using these functional groups, the probes can be 
attached using functional groups on the probes. For example, nucleic acids containing amino 
jfcbr groups can be attached to surfaces comprising amino groups, for example using linkers as are 
Q15 known in the art; for example, homo-or hetero-bifunctional linkers as are well known (see 

a 

I? y 1994 Pierce Chemical Company catalog, technical section on cross-linkers, pages 155-200, 
■=3 incorporated herein by reference). In addition, in some cases, additional linkers, such as 
Q alkyl groups (including substituted and heteroalkyl groups) may be used. 
12 [75] In this embodiment, the oligonucleotides are synthesized as is known 

=P20 in the art, and then attached to the surface of the solid support. As will be appreciated by 
|j[ those skilled in the art, either the 5' or 3' terminus may be attached to the solid support, or 
attachment may be via an internal nucleoside. 

[76] In an additional embodiment, the immobilization to the solid support 
may be very strong, yet non-covalent. For example, biotinylated oligonucleotides can be 
25 made, which bind to surfaces covalently coated with streptavidin, resulting in attachment. 

[01] Alternatively, the oligonucleotides may be synthesized on the surface, 
as is known in the art. For example, photoactivation techniques utilizing photopolymerization 
compounds and techniques are used. In a preferred embodiment, the nucleic acids can be 
synthesized in situ, using well known photolithographic techniques, such as those described 
30 in WO 95/251 16; WO 95/35505; U.S. Patent Nos. 5,700,637 and 5,445,934; and references 
cited within, all of which are expressly incorporated by reference; these methods of 
attachment form the basis of the Affimetrix GeneChip™ technology. 
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[78] In a preferred embodiment, colorectal cancer nucleic acids encoding 
colorectal cancer proteins are used to make a variety of expression vectors to express 
colorectal cancer proteins which can then be used in screening assays, as described below 
The expression vectors may be either self-replicating extrachromosomal vectors or vectors 
which mtegmte into a host genome. Generally, these expression vectors include 
transcriptional and translational regulatory nucleic acid operably linked to the nucleic acid 
encoding the colorectal cancer protein. The term "control sequences" refers to DNA 
sequences necessary for the expression of an operably linked coding sequence in a particular 
host organism. The control sequences that are suitable for prokaryotes, for example, include 
a promoter, optionally an operator sequence, and a ribosome binding site. Eukaryotic cells 
are known to utilize promoters, polyadenylation signals, and enhancers. 

[79] Nucleic acid is "operably linked" when it is placed into a functional 
relationship with another nucleic acid sequence. For example, DNA for a resequence or 
secretory leader is operably linked to DNA for a polypeptide if it is expressed as a preprotein 
that participates in the secretion of the polypeptide; a promoter or enhancer is operably linked 
to a coding sequence if it affects the transcription of the sequence; or a ribosome binding site 
is operably linked to a coding sequence if it is positioned so as to facilitate translation 
Generally, "operably linked" means that the DNA sequences being linked are contiguous 
and, m the case of a secretory leader, contiguous and in reading phase. However, enhancers 
do not have to be contiguous. Linking is accomplished by ligation at convenient restriction 
sites. If such sites do not exist, the synthetic oligonucleotide adaptors or linkers are used in 
accordance with conventional practice. The transcriptional and translational regulatory 
nucleic acid will generally be appropriate to the host cell used to express the colorectal cancer 
protein; for example, transcriptional and translational regulatory nucleic acid sequences from 
Bacillus are preferably used to express the colorectal cancer protein in Bacillus. Numerous 
types of appropriate expression vectors, and suitable regulatory sequences are known in the 
art for a variety of host cells. 

[80] In general, the transcriptional and translational regulatory sequences 
may include, but are not limited to, promoter sequences, ribosomal binding sites 
transcriptional start and stop sequences, translational start and stop sequences, and enhancer 
or activator sequences. In a preferred embodiment, the regulatory sequences include a 
promoter and transcriptional start and stop sequences. 

[81] Promoter sequences encode either constitutive or inducible promoters 
The promoters may be either naturally occurring promoters or hybrid promoters. Hybrid 
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promoters, which combine elements of more than one promoter, are also known in the art, 
and are useful in the present invention. 

[82] In addition, the expression vector may comprise additional elements. 
For example, the expression vector may have two replication systems, thus allowing it to be 
maintained in two organisms, for example in mammalian or insect cells for expression and in 
a procaryotic host for cloning and amplification. Furthermore, for integrating expression 
vectors, the expression vector contains at least one sequence homologous to the host cell 
genome, and preferably two homologous sequences which flank the expression construct. 
The integrating vector may be directed to a specific locus in the host cell by selecting the 
appropriate homologous sequence for inclusion in the vector. Constructs for integrating 
vectors are well known in the art. 

[83] In addition, in a preferred embodiment, the expression vector contains 
a selectable marker gene to allow the selection of transformed host cells. Selection genes are 
well known in the art and will vary with the host cell used. 

[84] The colorectal cancer proteins of the present invention are produced 
by culturing a host cell transformed with an expression vector containing nucleic acid 
encoding a colorectal cancer protein, under the appropriate conditions to induce or cause 
expression of the colorectal cancer protein. The conditions appropriate for colorectal cancer 
protein expression will vary with the choice of the expression vector and the host cell, and 
will be easily ascertained by one skilled in the art through routine experimentation. For 
example, the use of constitutive promoters in the expression vector will require optimizing 
the growth and proliferation of the host cell, while the use of an inducible promoter requires 
the appropriate growth conditions for induction. In addition, in some embodiments, the 
timing of the harvest is important. For example, the baculoviral systems used in insect cell 
expression are lytic viruses, and thus harvest time selection can be crucial for product yield. 

[85] Appropriate host cells include yeast, bacteria, archaebacteria, fungi, 
and insect and animal cells, including mammalian cells. Of particular interest are Drosophila 
melangaster cells, Saccharomyces cerevisiae and other yeasts, E. coli, Bacillus subtilis, Sf9 
cells, C129 cells, 293 cells, Neurospora, BHK, CHO, COS, HeLa cells, THP1 cell line (a 
macrophage cell line) and human cells and cell lines. 

[86] In a preferred embodiment, the colorectal cancer proteins are 
expressed in mammalian cells. Mammalian expression systems are also known in the art, and 
include retroviral systems. A preferred expression vector system is a retroviral vector system 
such as is generally described in PCT/US97/01019 and PCT/US97/01048, both of which are 
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hereby expressly incorporated by reference. Of particular use as mammalian promoters are 
the promoters from mammalian viral genes, since the viral genes are often highly expressed 
and have a broad host range. Examples include the SV40 early promoter, mouse mammary 
tumor virus LTR promoter, adenovirus major late promoter, herpes simplex virus promoter, 
and the CMV promoter. Typically, transcription termination and polyadenylation sequences 
recognized by mammalian cells are regulatory regions located 3' to the translation stop codon 
and thus, together with the promoter elements, flank the coding sequence. Examples of 
transcription terminator and polyadenlytion signals include those derived form SV40. 

[87] The methods of introducing exogenous nucleic acid into mammalian 
hosts, as well as other hosts, is well known in the art, and will vary with the host cell used. 
Techniques include dextran-mediated transfection, calcium phosphate precipitation, 
polybrene mediated transfection, protoplast fusion, electroporation, viral infection, 
encapsulation of the polynucleotide(s) in liposomes, and direct microinjection of the DNA 
into nuclei. 

[88] In a preferred embodiment, colorectal cancer proteins are expressed in 
bacterial systems. Bacterial expression systems are well known in the art. Promoters from 
bacteriophage may also be used and are known in the art. In addition, synthetic promoters 
and hybrid promoters are also useful; for example, the tac promoter is a hybrid of the trp and 
lac promoter sequences. Furthermore, a bacterial promoter can include naturally occurring 
promoters of non-bacterial origin that have the ability to bind bacterial RNA polymerase and 
initiate transcription. In addition to a functioning promoter sequence, an efficient ribosome 
binding site is desirable. The expression vector may also include a signal peptide sequence 
that provides for secretion of the colorectal cancer protein in bacteria. The protein is either 
secreted into the growth media (gram-positive bacteria) or into the periplasmic space, located 
between the inner and outer membrane of the cell (gram-negative bacteria). The bacterial 
expression vector may also include a selectable marker gene to allow for the selection of 
bacterial strains that have been transformed. Suitable selection genes include genes which 
render the bacteria resistant to drugs such as ampicillin, chloramphenicol, erythromycin, 
kanamycin, neomycin and tetracycline. Selectable markers also include biosynthetic genes, 
such as those in the histidine, tryptophan and leucine biosynthetic pathways. These 
components are assembled into expression vectors. Expression vectors for bacteria are well 
known in the art, and include vectors for Bacillus subtilis, E. coli, Streptococcus cremoris, 
and Streptococcus lividans, among others. The bacterial expression vectors are transformed 
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into bacterial host cells using techniques well known in the art, such as calcium chloride 
treatment, electroporation, and others. 

[89] In one embodiment, colorectal cancer proteins are produced in insect 
cells. Expression vectors for the transformation of insect cells, and in particular, baculovirus- 
5 based expression vectors, are well known in the art. 

[90] In a preferred embodiment, colorectal cancer protein is produced in 
yeast cells. Yeast expression systems are well known in the art, and include expression 
vectors for Saccharomyces cerevisiae, Candida albicans and C. maltosa, Hansenula 
polymorpha, Kluyveromyces fragilis and K. lactis, Pichia guillerimondii and P. pastoris, 
10 Schizosaccharomyces pombe, and Yarrowia lipolytica. 

[91] The colorectal cancer protein may also be made as a fusion protein, 
using techniques well known in the art. Thus, for example, for the creation of monoclonal 
Iri antibodies, if the desired epitope is small, the colorectal cancer protein may be fused to a 
)^ carrier protein to form an immunogen. Alternatively, the colorectal cancer protein may be 
Q5 made as a fusion protein to increase expression, or for other reasons. For example, when the 
.■hi colorectal cancer protein is a colorectal cancer peptide, the nucleic acid encoding the peptide 
Q may be linked to other nucleic acid for expression purposes. 

Q [92] In one embodiment, the colorectal cancer nucleic acids, proteins and 

;^ antibodies of the invention are labeled. By "labeled" herein is meant that a compound has at 
-SO least one element, isotope or chemical compound attached to enable the detection of the 
s i compound. In general, labels fall into three classes: a) isotopic labels, which may be 

radioactive or heavy isotopes; b) immune labels, which may be antibodies or antigens; and c) 
colored or fluorescent dyes. The labels may be incorporated into the colorectal cancer 
nucleic acids, proteins and antibodies at any position. For example, the label should be 
25 capable of producing, either directly or indirectly, a detectable signal. The detectable moiety 
may be a radioisotope, such as 3H, 14C, 32P, 35S, or 1251, a fluorescent or 
chemiluminescent compound, such as fluorescein isothiocyanate, rhodamine, or luciferin, or 
an enzyme, such as alkaline phosphatase, beta-gal actosidase or horseradish peroxidase. Any 
method known in the art for conjugating the antibody to the label may be employed, 
30 including those methods described by Hunter et al., Nature, 144:945 (1962); David et al., 
Biochemistry, 13:1014 (1974); Pain et al., J. Immunol. Meth., 40:219 (1981); and Nygren, J. 
Histochem. and Cytochem., 30:407 (1982). 

[93] Accordingly, the present invention also provides colorectal cancer 
protein sequences. A colorectal cancer protein of the present invention may be identified in 
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several ways. "Protein" in this sense includes proteins, polypeptides, and peptides terms 
which are used interchangeably herein to refer to a polymer of amino acid residues. The 
terms apply to amino acid polymers in which one or more amino acid residue is an artificial 
chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally 
5 occurring amino acid polymers, those containing modified residues, and non-naturally 
occurring amino acid polymer. 

[94] As will be appreciated by those in the art, the nucleic acid sequences 
of the invention can be used to generate protein sequences. There are a variety of ways to do 
this, including cloning the entire gene and verifying its frame and amino acid sequence, or by 
10 comparing it to known sequences to search for homology to provide a frame, assuming the 
colorectal cancer protein has homology to some protein in the database being used. 
Generally, the nucleic acid sequences are input into a program that will search all three 
frames for homology. This is done in a preferred embodiment using the following NCBI 
Advanced BLAST parameters. The program is blastx or blastn. The database is nr. The 
Ci5 input data is as "Sequence in FASTA format". The organism list is "none". The "expect" is 
10; the filter is default. The "descriptions" is 500, the "alignments" is 500, and the 
"alignment view" is pairwise. The "Query Genetic Codes" is standard (1). The matrix is 
p BLOSUM62; gap existence cost is 1 1, per residue gap cost is 1; and the lambda ratio is .85 
T? default. This results in the generation of a putative protein sequence. 

*?20 [95] Also included within one embodiment of colorectal cancer proteins 

Q 

are amino acid variants of the naturally occurring sequences, as determined herein. 
Preferably, the variants are preferably greater than about 75% homologous to the wild-type 
sequence, more preferably greater than about 80%, even more preferably greater than about 
85% and most preferably greater than 90%. In some embodiments the homology will be as 
25 high as about 93 to 95 or 98%. As for nucleic acids, homology in this context means 
sequence similarity or identity, with identity being preferred. This homology will be 
determined using standard techniques known in the art as are outlined above for the nucleic 
acid homologies. 

[96] Colorectal cancer proteins of the present invention may be shorter or 
30 longer than the wild type amino acid sequences. Thus, in a preferred embodiment, included 
within the definition of colorectal cancer proteins are portions or fragments of the wild type 
sequences, herein. In addition, as outlined above, the colorectal cancer nucleic acids of the 
invention may be used to obtain additional coding regions, and thus additional protein 
sequence, using techniques known in the art. 
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[97] In a preferred embodiment, the colorectal cancer proteins are 
derivative or variant colorectal cancer proteins as compared to the wild-type sequence. That 
is, as outlined more fully below, the derivative colorectal cancer peptide will contain at least 
one amino acid substitution, deletion or insertion, with amino acid substitutions being 
5 particularly preferred. The amino acid substitution, insertion or deletion may occur at any 
residue within the colorectal cancer peptide. 

[98] Also included in an embodiment of colorectal cancer proteins of the 
present invention are amino acid sequence variants. These variants fall into one or more of 
three classes: substitutional, insertional or deletional variants. These variants ordinarily are 
10 prepared by site specific mutagenesis of nucleotides in the DNA encoding the colorectal 

cancer protein, using cassette or PCR mutagenesis or other techniques well known in the art, 
to produce DNA encoding the variant, and thereafter expressing the DNA in recombinant cell 
Ih* culture as outlined above. However, variant colorectal cancer protein fragments having up to 
^ about 100-150 residues may be prepared by in vitro synthesis using established techniques. 

£35 Amino acid sequence variants are characterized by the predetermined nature of the variation, 

O 

f y a feature that sets them apart from naturally occurring allelic or interspecies variation of the 

colorectal cancer protein amino acid sequence. The variants typically exhibit the same 
Q qualitative biological activity as the naturally occurring analogue, although variants can also 
12 be selected which have modified characteristics as will be more fully outlined below. 
*F20 [99] While the site or region for introducing an amino acid sequence 

i& variation is predetermined, the mutation per se need not be predetermined. For example, in 
order to optimize the performance of a mutation at a given site, random mutagenesis may be 
conducted at the target codon or region and the expressed colorectal cancer variants screened 
for the optimal combination of desired activity. Techniques for making substitution 
25 mutations at predetermined sites in DNA having a known sequence are well known, for 

example, M13 primer mutagenesis and PCR mutagenesis. Screening of the mutants is done 
using assays of colorectal cancer protein activities. 

[100] Amino acid substitutions are typically of single residues; insertions 
usually will be on the order of from about 1 to 20 amino acids, although considerably larger 
30 insertions may be tolerated. Deletions range from about 1 to about 20 residues, although in 
some cases deletions may be much larger. 

[101] Substitutions, deletions, insertions or any combination thereof may be 
used to arrive at a final derivative. Generally these changes are done on a few amino acids to 
minimize the alteration of the molecule. However, larger changes may be tolerated in certain 
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circumstances. When small alterations in the characteristics of the colorectal cancer protein 
are desired, substitutions are generally made in accordance with the following chart: 

Chart I 

Original Residue Exemplary Substitutions 

5 
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He, Val 


13 


Lys 


Arg, Gin, Glu 




Met 


Leu, De 


? cV 


Phe 


Met, Leu, Tyr 
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Ser 




Trp 


Tyr 
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Trp, Phe 
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[102] Substantial changes in function or immunological identity are made by 
selecting substitutions that are less conservative than those shown in Chart I. For example, 
substitutions may be made which more significantly affect: the structure of the polypeptide 
backbone in the area of the alteration, for example the alpha-helical or beta-sheet structure; 
30 the charge or hydrophobicity of the molecule at the target site; or the bulk of the side chain. 
The substitutions which in general are expected to produce the greatest changes in the 
polypeptide's properties are those in which (a) a hydrophilic residue, e.g. seryl or threonyl is 
substituted for (or by) a hydrophobic residue, e.g. leucyl, isoleucyl, phenylalanyl, valyl or 
alanyl; (b) a cysteine or proline is substituted for (or by) any other residue; (c) a residue 
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having an electropositive side chain, e.g. lysyl, arginyl, or histidyl, is substituted for (or by) 
an electronegative residue, e.g. glutamyl or aspartyl; or (d) a residue having a bulky side 
chain, e.g. phenylalanine, is substituted for (or by) one not having a side chain, e.g. glycine. 

[103] The variants typically exhibit the same qualitative biological activity 
and will elicit the same immune response as the naturally-occurring analogue, although 
variants also are selected to modify, the characteristics of the colorectal cancer proteins as 
needed. Alternatively, the variant may be designed such that the biological activity of the 
colorectal cancer protein is altered. For example, glycosylation sites may be altered or 
removed. 

[104] Covalent modifications of colorectal cancer polypeptides are included 
within the scope of this invention. One type of covalent modification includes reacting 
targeted amino acid residues of a colorectal cancer polypeptide with an organic derivatizing 
agent that is capable of reacting with selected side chains or the N-or C-terminal residues of a 
colorectal cancer polypeptide. Derivatization with bifunctional agents is useful, for instance, 
for crosslinking colorectal cancer to a water-insoluble support matrix or surface for use in 
the method for purifying anti-colorectal cancer antibodies or screening assays, as is more 
fully described below. Commonly used crosslinking agents include, e.g., l,l-bis(diazo- 
acetyl)-2-phenylethane, glutaraldehyde, N-hydroxy-succinimide esters, for example, esters 
with 4-azido-salicylic acid, homobifunctional imidoesters, including disuccinimidyl esters 
such as 3,3 , -dithiobis-(succinimidyl-propionate), bifunctional maleimides such as bis-N- 
maleimido-l,8-octane and agents such as methyl-3-[(p-azidophenyl)-dithio]pro-pioimi-date. 

[01] Other modifications include deamidation of glutaminyl and 
asparaginyl residues to the corresponding glutamyl and aspartyl residues, respectively, 
hydroxylation of proline and lysine, phosphorylation of hydroxyl groups of seryl, threonyl or 
tyrosyl residues, methylation of the a-amino groups of lysine, arginine, and histidine side 
chains [T.E. Creighton, Proteins: Structure and Molecular Properties, W.H. Freeman & Co., 
San Francisco, pp. 79-86 (1983)], acetylation of the N-terminal amine, and amidation of any 
C-terminal carboxyl group. 

[106] Another type of covalent modification of the colorectal cancer 
polypeptide included within the scope of this invention comprises altering the native 
glycosylation pattern of the polypeptide. "Altering the native glycosylation pattern" is 
intended for purposes herein to mean deleting one or more carbohydrate moieties found in 
native sequence colorectal cancer polypeptide, and/or adding one or more glycosylation sites 
that are not present in the native sequence colorectal cancer polypeptide. 
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[107] Addition of glycosylation sites to colorectal cancer polypeptides may 
be accomplished by altering the amino acid sequence thereof. The alteration may be made, 
for example, by the addition of, or substitution by, one or more serine or threonine residues to 
the native sequence colorectal cancer polypeptide (for O-linked glycosylation sites). The 
colorectal cancer amino acid sequence may optionally be altered through changes at the 
DNA level, particularly by mutating the DNA encoding the colorectal cancer polypeptide at 
preselected bases such that codons are generated that will translate into the desired amino 
acids. 

[108] Another means of increasing the number of carbohydrate moieties on 
the colorectal cancer polypeptide is by chemical or enzymatic coupling of glycosides to the 
polypeptide. Such methods are described in the art, e.g., in WO 87/05330 published 11 
September 1987, and in Aplin and Wriston, colorectal cancer Crit. Rev. Biochem., pp. 259- 
306(1981). 

[109] Removal of carbohydrate moieties present on the colorectal cancer 
polypeptide may be accomplished chemically or enzymatically or by mutational substitution 
of codons encoding for amino acid residues that serve as targets for glycosylation. Chemical 
deglycosylation techniques are known in the art and described, for instance, by Hakimuddin, 
et al., Arch. Biochem. Biophys., 259:52 (1987) and by Edge et al., Anal. Biochem., 118:131 
(1981). Enzymatic cleavage of carbohydrate moieties on polypeptides can be achieved by the 
use of a variety of endo-and exo-glycosidases as described by Thotakura et al., Meth. 
Enzymol., 138:350(1987). 

[110] Another type of covalent modification of colorectal cancer comprises 
linking the colorectal cancer polypeptide to one of a variety of nonproteinaceous polymers, 
e.g., polyethylene glycol, polypropylene glycol, or polyoxyalkylenes, in the manner set forth 
in U.S. Patent Nos. 4,640,835; 4,496,689; 4,301,144; 4,670,417; 4,791,192 or 4,179,337. 

[Ill] colorectal cancer polypeptides of the present invention may also be 
modified in a way to form chimeric molecules comprising a colorectal cancer polypeptide 
fused to another, heterologous polypeptide or amino acid sequence. In one embodiment, such 
a chimeric molecule comprises a fusion of a colorectal cancer polypeptide with a tag 
polypeptide which provides an epitope to which an anti-tag antibody can selectively bind. 
The epitope tag is generally placed at the amino-or carboxyl-terminus of the colorectal cancer 
polypeptide. The presence of such epitope- tagged forms of a colorectal cancer polypeptide 
can be detected using an antibody against the tag polypeptide. Also, provision of the epitope 
tag enables the colorectal cancer polypeptide to be readily purified by affinity purification 
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using an anti-tag antibody or another type of affinity matrix that binds to the epitope tag. In 
an alternative embodiment, the chimeric molecule may comprise a fusion of a colorectal 
cancer polypeptide with an immunoglobulin or a particular region of an immunoglobulin. 
For a bivalent form of the chimeric molecule, such a fusion could be to the Fc region of an 
5 IgG molecule. 

[112] Various tag polypeptides and their respective antibodies are well 
known in the art. Examples include poly-histidine (poly-his) or poly-histidine-glycine (poly- 
his-gly) tags; the flu HA tag polypeptide and its antibody 12CA5 [Field et al., Mol. Cell. 
Biol., 8:2159-2165 (1988)]; the c-myc tag and the 8F9, 3C7, 6E10, G4, B7 and 9E10 
10 antibodies thereto [Evan et al., Molecular and Cellular Biology, 5:3610-3616 (1985)]; and the 
Herpes Simplex virus glycoprotein D (gD) tag and its antibody [Paborsky et al., Protein 
Engineering, 3(6):547-553 (1990)]. Other tag polypeptides include the Flag-peptide [Hopp et 
J?j al., BioTechnology, 6: 1204-1210 (1988)]; the KT3 epitope peptide [Martin et al., Science, 

^ 255:192-194 (1992)]; tubulin epitope peptide [Skinner et al., J. Biol. Chem., 266:15163- 

iy 

1:15 15166 (1991)]; and the T7 gene 10 protein peptide tag [Lutz-Freyermuth et al., Proc. Natl. 
f[j Acad. Sci. USA, 87:6393-6397 (1990)]. 

[113] Also included with the definition of colorectal cancer protein in one 
Q embodiment are other colorectal cancer proteins of the colorectal cancer family, and 
^1 colorectal cancer proteins from other organisms, which are cloned and expressed as outlined 
420 below. Thus, probe or degenerate polymerase chain reaction (PCR) primer sequences may be 
used to find other related colorectal cancer proteins from humans or other organisms. As 
will be appreciated by those in the art, particularly useful probe and/or PCR primer sequences 
include the unique areas of the colorectal cancer nucleic acid sequence. As is generally 
known in the art, preferred PCR primers are from about 15 to about 35 nucleotides in length, 
25 with from about 20 to about 30 being preferred, and may contain inosine as needed. The 
conditions for the PCR reaction are well known in the art. 

[114] In addition, as is outlined herein, colorectal cancer proteins can be 
made that are longer than those depicted in the Table 1 or Table 2 for example, by the 
elucidation of additional sequences, the addition of epitope or purification tags, the addition 
30 of other fusion sequences, etc. 

[115] Colorectal cancer proteins may also be identified as being encoded by 
colorectal cancer nucleic acids. Thus, colorectal cancer proteins are encoded by nucleic 
acids that will hybridize to the sequences of the sequence listings, or their complements, as 
outlined herein. 
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[116] In a preferred embodiment, when the colorectal cancer protein is to be 
used to generate antibodies, for example for immunotherapy, the colorectal cancer protein 
should share at least one epitope or determinant with the full length protein. By "epitope" or 
"determinant" herein is meant a portion of a protein which will generate and/or bind an 
5 antibody or T-cell receptor in the context of MHC. Thus, in most instances, antibodies made 
to a smaller colorectal cancer protein will be able to bind to the full length protein. In a 
preferred embodiment, the epitope is unique; that is, antibodies generated to a unique epitope 
show little or no cross-reactivity. In a preferred embodiment, the epitope is selected from a 
peptide encoded by a nucleic acid of Tablel. In another preferred embodiment, the epitope is 
10 selected from the CBF9 peptide sequence shown in Table 2. 

[117] In one embodiment, the term "antibody" includes antibody fragments, 
as are known in the art, including Fab, Fab2, single chain antibodies (Fv for example), 
~S chimeric antibodies, etc., either produced by the modification of whole antibodies or those 
^ synthesized de novo using recombinant DNA technologies. 

QL5 [118] Methods of preparing polyclonal antibodies are known to the skilled 

£3 

iTi artisan. Polyclonal antibodies can be raised in a mammal, for example, by one or more 
O injections of an immunizing agent and, if desired, an adjuvant. Typically, the immunizing 
Q agent and/or adjuvant will be injected in the mammal by multiple subcutaneous or 
'lg intraperitoneal injections. The immunizing agent may include the CBF9 peptide of Table 2, 
420 or a peptide encoded by a nucleic acid of Table 1 or fragment thereof or a fusion protein 
iU thereof. It may be useful to conjugate the immunizing agent to a protein known to be 

immunogenic in the mammal being immunized. Examples of such immunogenic proteins 
include but are not limited to keyhole limpet hemocyanin, serum albumin, bovine 
thyroglobulin, and soybean trypsin inhibitor. Examples of adjuvants which may be employed 
25 include Freund's complete adjuvant and MPL-TDM adjuvant (monophosphoryl Lipid A, 
synthetic trehalose dicorynomycolate). The immunization protocol may be selected by one 
skilled in the art without undue experimentation. 

[119] The antibodies may, alternatively, be monoclonal antibodies. 
Monoclonal antibodies may be prepared using hybridoma methods, such as those described 
30 by Kohler and Milstein, Nature, 256:495 (1975). In a hybridoma method, a mouse, hamster, 
or other appropriate host animal, is typically immunized with an immunizing agent to elicit 
lymphocytes that produce or are capable of producing antibodies that will specifically bind to 
the immunizing agent. Alternatively, the lymphocytes may be immunized in vitro. The 
immunizing agent will typically include the CBF9 polypeptide or a peptide encoded by a 
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nucleic . acid of Table 1 or a fragment thereof or a fusion protein thereof. Generally, either 
peripheral blood lymphocytes ("PBLs M ) are used if cells of human origin are desired, or 
spleen cells or lymph node cells are used if non-human mammalian sources are desired. The 
lymphocytes are then fused with an immortalized cell line using a suitable fusing agent, such 
as polyethylene glycol, to form a hybridoma cell [Goding, Monoclonal Antibodies: Principles 
and Practice, Academic Press, (1986) pp. 59-103]. Immortalized cell lines are usually 
transformed mammalian cells, particularly myeloma cells of rodent, bovine and human 
origin. Usually, rat or mouse myeloma cell lines are employed. The hybridoma cells may be 
cultured in a suitable culture medium that preferably contains one or more substances that 
inhibit the growth or survival of the unfused, immortalized cells. For example, if the parental 
cells lack the enzyme hypoxanthine guanine phosphoribosyl transferase (HGPRT or HPRT), 
the culture medium for the hybridomas typically will include hypoxanthine, aminopterin, and 
thymidine ("HAT medium"), which substances prevent the growth of HGPRT-deficient cells. 

[120] In one embodiment, the antibodies are bispecific antibodies. 
Bispecific antibodies are monoclonal, preferably human or humanized, antibodies that have 
binding specificities for at least two different antigens. In the present case, one of the binding 
specificities is for a colorectal cancer protein or a fragment thereof, the other one is for any 
other antigen, and preferably for a cell-surface protein or receptor or receptor subunit, 
preferably one that is tumor specific. 

[121] In a preferred embodiment, the antibodies to colorectal cancer are 
capable of reducing or eliminating the biological function of colorectal cancer , as is 
described below. That is, the addition of anti-colorectal cancer antibodies (either polyclonal 
or preferably monoclonal) to colorectal cancer (or cells containing colorectal cancer ) may 
reduce or eliminate the colorectal cancer activity. Generally, at least a 25% decrease in 
activity is preferred, with at least about 50% being particularly preferred and about a 95- 
100% decrease being especially preferred. 

[122] In a preferred embodiment the antibodies to the colorectal cancer 
proteins are humanized antibodies. Humanized forms of non-human (e.g., murine) antibodies 
are chimeric molecules of immunoglobulins, immunoglobulin chains or fragments thereof 
(such as Fv, Fab, Fab', F(ab')2 or other antigen-binding subsequences of antibodies) which 
contain minimal sequence derived from non-human immunoglobulin. Humanized antibodies 
include human immunoglobulins (recipient antibody) in which residues form a 
complementary determining region (CDR) of the recipient are replaced by residues from a 
CDR of a non-human species (donor antibody) such as mouse, rat or rabbit having the desired 
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specificity, affinity and capacity. In some instances, Fv framework residues of the human 
immunoglobulin are replaced by corresponding non-human residues. Humanized antibodies 
may also comprise residues which are found neither in the recipient antibody nor in the 
imported CDR or framework sequences. In general, the humanized antibody will comprise 
substantially all of at least one, and typically two, variable domains, in which all or 
substantially all of the CDR regions correspond to those of a non-human immunoglobulin 
and all or substantially all of the FR regions are those of a human immunoglobulin consensus 
sequence. The humanized antibody optimally also will comprise at least a portion of an 
immunoglobulin constant region (Fc), typically that of a human immunoglobulin [Jones et 
al., Nature, 321:522-525 (1986); Riechmann et al., Nature, 332:323-329 (1988); and Presta, 
Curr. Op. Struct. Biol., 2:593-596 (1992)]. 

[123] Methods for humanizing non-human antibodies are well known in the 
art. Generally, a humanized antibody has one or more amino acid residues introduced into it 
from a source which is non-human. These non-human amino acid residues are often referred 
to as import residues, which are typically taken from an import variable domain. 
Humanization can be essentially performed following the method of Winter and co-workers 
[Jones et al., Nature, 321:522-525 (1986); Riechmann et al., Nature, 332:323-327 (1988); 
Verhoeyen et al., Science, 239:1534-1536 (1988)], by substituting rodent CDRs or CDR 
sequences for the corresponding sequences of a human antibody. Accordingly, such 
humanized antibodies are chimeric antibodies (U.S. Patent No. 4,816,567), wherein 
substantially less than an intact human variable domain has been substituted by the 
corresponding sequence from a non-human species. In practice, humanized antibodies are 
typically human antibodies in which some CDR residues and possibly some FR residues are 
substituted by residues from analogous sites in rodent antibodies. 

[124] Human antibodies can also be produced using various techniques 
known in the art, including phage display libraries [Hoogenboom and Winter, J. Mol. Biol., 
227:381 (1991); Marks et al., J. Mol. Biol., 222:581 (1991)]. The techniques of Cole et al. 
and Boerner et al. are also available for the preparation of human monoclonal antibodies 
(Cole et al., Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, p. 77 (1985) and 
Boerner et al., J. Immunol., 147(l):86-95 (1991)]. Similarly, human antibodies can be made 
by introducing of human immunoglobulin loci into transgenic animals, e.g., mice in which 
the endogenous immunoglobulin genes have been partially or completely inactivated. Upon 
challenge, human antibody production is observed, which closely resembles that seen in 
humans in all respects, including gene rearrangement, assembly, and antibody repertoire. 
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This approach is described, for example, in U.S. Patent Nos. 5,545,807; 5,545,806; 
5,569,825; 5,625,126; 5,633,425; 5,661,016, and in the following scientific publications: 
Marks et al., Biotechnology 10, 779-783 (1992); Lonberg et al., Nature 368 856-859 (1994); 
Morrison, Nature 368, 812-13 (1994); Fishwild et al., Nature Biotechnology 14, 845-51 
5 (1996); Neuberger, Nature Biotechnology 14, 826 (1996); Lonberg and Huszar, Intern. Rev. 
Immunol. 13 65-93 (1995). 

[125] By immunotherapy is meant treatment of colorectal cancer with an 
antibody raised against colorectal cancer proteins. As used herein, immunotherapy can be 
passive or active. Passive immunotherapy as defined herein is the passive transfer of 
10 antibody to a recipient (patient). Active immunization is the induction of antibody and/or T- 
cell responses in a recipient (patient). Induction of an immune response is the result of 
providing the recipient with an antigen to which antibodies are raised. As appreciated by one 
V3 of ordinary skill in the art, the antigen may be provided by injecting a polypeptide against 
i'Jj which antibodies are desired to be raised into a recipient, or contacting the recipient with a 
Q5 nucleic acid capable of expressing the antigen and under conditions for expression of the 

fy antigen. 

if 55 * 

f==J [126] In a preferred embodiment the colorectal cancer proteins against 

C3 which antibodies are raised are secreted proteins as described above. Without being bound 
ll by theory, antibodies used for treatment, bind and prevent the secreted protein from binding 
°^k0 to its receptor, thereby inactivating the secreted colorectal cancer protein. 
jU [01] In another preferred embodiment, the colorectal cancer protein to 

which antibodies are raised is a transmembrane protein. Without being bound by theory, 
antibodies used for treatment, bind the extracellular domain of the colorectal cancer protein 
and prevent it from binding to other proteins, such as circulating ligands or cell-associated 
25 molecules. The antibody may cause down-regulation of the transmembrane colorectal cancer 
protein. As will be appreciated by one of ordinary skill in the art, the antibody may be a 
competitive, non-competitive or uncompetitive inhibitor of protein binding to the 
extracellular domain of the colorectal cancer protein. The antibody is also an antagonist of 
the colorectal cancer protein. Further, the antibody prevents activation of the transmembrane 
30 colorectal cancer protein. In one aspect, when the antibody prevents the binding of other 
molecules to the colorectal cancer protein, the antibody prevents growth of the cell. The 
antibody also sensitizes the cell to cytotoxic agents, including, but not limited to TNF-ot, 
TNF-P, IL-1, INF-y and IL-2, or chemotherapeutic agents including 5FU, vinblastine, 
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actinomycin D, cisplatin, methotrexate, and the like. In some instances the antibody belongs 
to a sub-type that activates serum complement when complexed with the transmembrane 
protein thereby mediating cytotoxicity. Thus, colorectal cancer is treated by administering to 
a patient antibodies directed against the transmembrane colorectal cancer protein. 

[128] In another preferred embodiment, the antibody is conjugated to a 
therapeutic moiety. In one aspect the therapeutic moiety is a small molecule that modulates 
the activity of the colorectal cancer protein. In another aspect the therapeutic moiety 
modulates the activity of molecules associated with or in close proximity to the colorectal 
cancer protein. The therapeutic moiety may inhibit enzymatic activity such as protease or 
protein kinase activity associated with colorectal cancer . 

[129] In a preferred embodiment, the therapeutic moiety may also be a 
cytotoxic agent. In this method, targeting the cytotoxic agent to tumor tissue or cells, results 
in a reduction in the number of afflicted cells, thereby reducing symptoms associated with 
colorectal cancer . Cytotoxic agents are numerous and varied and include, but are not limited 
to, cytotoxic drugs or toxins or active fragments of such toxins. Suitable toxins and their 
corresponding fragments include diptheria A chain, exotoxin A chain, ricin A chain, abrin A 
chain, curcin, crotin, phenomycin, enomycin and the like. Cytotoxic agents also include 
radiochemicals made by conjugating radioisotopes to antibodies raised against colorectal 
cancer proteins, or binding of a radionuclide to a chelating agent that has been covalently 
attached to the antibody. Targeting the therapeutic moiety to transmembrane colorectal 
cancer proteins not only serves to increase the local concentration of therapeutic moiety in 
the colorectal cancer afflicted area, but also serves to reduce deleterious side effects that may 
be associated with the therapeutic moiety. 

[130] In another .preferred embodiment, the colorectal cancer protein against 
which the antibodies are raised is an intracellular protein. In this case, the antibody may be 
conjugated to a protein which facilitates entry into the cell. In one case, the antibody enters 
the cell by endocytosis. In another embodiment, a nucleic acid encoding the antibody is 
administered to the individual or cell. Moreover, wherein the colorectal cancer protein can 
be targeted within a cell, i.e., the nucleus, an antibody thereto contains a signal for that target 
localization, i.e., a nuclear localization signal. 

[131] The colorectal cancer antibodies of the invention specifically bind to 
colorectal cancer proteins. By "specifically bind" herein is meant that the antibodies bind to 
the protein with a binding constant in the range of at least 1CT 4 - 10" 6 M"\ with a preferred 
range being 10" 7 - 10" 9 M' 1 . 
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[132] In a preferred embodiment, the colorectal cancer protein is purified or 
isolated after expression. Colorectal cancer proteins may be isolated or purified in a variety 
of ways known to those skilled in the art depending on what other components are present in 
the sample. Standard purification methods include electrophoretic, molecular, 
immunological and chromatographic techniques, including ion exchange, hydrophobic, 
affinity, and reverse-phase HPLC chromatography, and chromatofocusing. For example, the 
colorectal cancer protein may be purified using a standard anti-colorectal cancer antibody 
column. Ultrafiltration and diafiltration techniques, in conjunction with protein 
concentration, are also useful. For general guidance in suitable purification techniques, see 
Scopes, R., Protein Purification, Springer- Verlag, NY (1982). The degree of purification 
necessary will vary depending on the use of the colorectal cancer protein. In some instances 
no purification will be necessary. 

[133] Once expressed and purified if necessary, the colorectal cancer 
proteins and nucleic acids are useful in a number of applications. 

[134] In one aspect, the expression levels of genes are determined for 
different cellular states in the colorectal cancer phenotype; that is, the expression levels of 
genes in normal colon tissue and in colorectal cancer tissue (and in some cases, for varying 
severities of colorectal cancer that relate to prognosis, as outlined below) are evaluated to 
provide expression profiles. An expression profile of a particular cell state or point of 
development is essentially a "fingerprint" of the state; while two states may have any 
particular gene similarly expressed, the evaluation of a number of genes simultaneously 
allows the generation of a gene expression profile that is unique to the state of the cell. By 
comparing expression profiles of cells in different states, information regarding which genes 
are important (including both up- and down-regulation of genes) in each of these states is 
obtained. Then, diagnosis may be done or confirmed: does tissue from a particular patient 
have the gene expression profile of normal or colorectal cancer tissue. 

[01] "Differential expression," or grammatical equivalents as used herein, 
refers to both qualitative as well as quantitative differences in the genes' temporal and/or 
cellular expression patterns within and among the cells. Thus, a differentially expressed gene 
can qualitatively have its expression altered, including an activation or inactivation, in, for 
example, normal versus colorectal cancer tissue. That is, genes may be turned on or turned 
off in a particular state, relative to another state. As is apparent to the skilled artisan, any 
comparison of two or more states can be made. Such a qualitatively regulated gene will 
exhibit an expression pattern within a state or cell type which is detectable by standard 
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techniques in one such state or cell type, but is not detectable in both. Alternatively, the 
determination is quantitative in that expression is increased or decreased; that is, the 
expression of the gene is either upregulated, resulting in an increased amount of transcript, or 
downregulated, resulting in a decreased amount of transcript. The degree to which 
expression differs need only be large enough to quantify via standard characterization 
techniques as outlined below, such as by use of Affymetrix GeneChip™ expression arrays, 
Lockhart, Nature Biotechnology, 14:1675-1680 (1996), hereby expressly incorporated by 
reference. Other techniques include, but are not limited to, quantitative reverse transcriptase 
PCR, Northern analysis and RNase protection. As outlined above, preferably the change in 
expression (i.e. upregulation or downregulation) is at least about 50%, more preferably at 
least about 100%, more preferably at least about 150%, more preferably, at least about 200%, 
with from 300 to at least 1000% being especially preferred. 

[136] As will be appreciated by those in the art, this may be done by 
evaluation at either the gene transcript, or the protein level; that is, the amount of gene 
expression may be monitored using nucleic acid probes to the DNA or RNA equivalent of the 
gene transcript, and the quantification of gene expression levels, or, alternatively, the final 
gene product itself (protein) can be monitored, for example through the use of antibodies to 
the colorectal cancer protein and standard immunoassays (ELIS As,e tc.) or other techniques, 
including mass spectroscopy assays, 2D gel electrophoresis assays, etc. Thus, the proteins 
corresponding to colorectal cancer genes, i.e. those identified as being important in a 
colorectal cancer phenotype, can be evaluated in a colorectal cancer diagnostic test. 

[137] In a preferred embodiment, gene expression monitoring is done and a 
number of genes, i.e. an expression profile, is monitored simultaneously, although multiple 
protein expression monitoring can be done as well. Similarly, these assays may be done on 
an individual basis as well. 

[138] In this embodiment, the colorectal cancer nucleic acid probes are 
attached to biochips as outlined herein for the detection and quantification of colorectal 
cancer sequences in a particular cell. The assays are further described below in the example. 

[139] In a preferred embodiment nucleic acids encoding the colorectal 
cancer protein are detected. Although DNA or RNA encoding the colorectal cancer protein 
may be detected, of particular interest are methods wherein the mRNA encoding a colorectal 
cancer protein is detected. The presence of mRNA in a sample is an indication that the 
colorectal cancer gene has been transcribed to form the mRNA, and suggests that the protein 
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is expressed. Probes to detect the mRNA can be any nucleotide/deoxynucleotide probe that 
is complementary to and base pairs with the mRNA and includes but is not limited to 
oligonucleotides, cDNA or RNA. Probes also should contain a detectable label, as defined 
herein. In one method the mRNA is detected after immobilizing the nucleic acid to be 
examined on a solid support such as nylon membranes and hybridizing the probe with the 
sample. Following washing to remove the non-specifically bound probe, the label is 
detected. In another method detection of the mRNA is performed in situ. In this method 
permeabilized cells or tissue samples are contacted with a detectably labeled nucleic acid 
probe for sufficient time to allow the probe to hybridize with the target mRNA. Following 
washing to remove the non-specifically bound probe, the label is detected. For example a 
digoxygenin labeled riboprobe (RNA probe) that is complementary to the mRNA encoding a 
colorectal cancer protein is detected by binding the digoxygenin with an anti-digoxygenin 
secondary antibody and developed with nitro blue tetrazolium and 5-bromo-4-chloro-3- 
indoyl phosphate. 

[140] In a preferred embodiment, any of the three classes of proteins as 
described herein (secreted, transmembrane or intracellular proteins) are used in diagnostic 
assays. The colorectal cancer proteins, antibodies, nucleic acids, modified proteins and cells 
containing colorectal cancer sequences are used in diagnostic assays. This can be done on an 
individual gene or corresponding polypeptide level. In a preferred embodiment, the 
expression profiles are used, preferably in conjunction with high throughput screening 
techniques to allow monitoring for expression profile genes and/or corresponding 
polypeptides. 

[141] As described and defined herein, colorectal cancer proteins, including 
intracellular, transmembrane or secreted proteins, find use as markers of colorectal cancer . 
Detection of these proteins in putative colorectal cancer tissue or patients allows for a 
determination or diagnosis of colorectal cancer . Numerous methods known to those of 
ordinary skill in the art find use in detecting colorectal cancer . In one embodiment, 
antibodies are used to detect colorectal cancer proteins. A preferred method separates 
proteins from a sample or patient by electrophoresis on a gel (typically a denaturing and 
reducing protein gel, but may be any other type of gel including isoelectric focusing gels and 
the like). Following separation of proteins, the colorectal cancer protein is detected by 
immunoblotting with antibodies raised against the colorectal cancer protein. Methods of 
immunoblotting are well known to those of ordinary skill in the art. 
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[142] In another preferred method, antibodies to the colorectal cancer 
protein find use in in situ imaging techniques. In this method cells are contacted with from 
one to many antibodies to the colorectal cancer protein(s). Following washing to remove 
non-specific antibody binding, the presence of the antibody or antibodies is detected. In one 
embodiment the antibody is detected by incubating with a secondary antibody that contains a 
detectable label. In another method the primary antibody to the colorectal cancer protein(s) 
contains a detectable label. In another preferred embodiment each one of multiple primary 
antibodies contains a distinct and detectable label. This method finds particular use in 
simultaneous screening for a plurality of colorectal cancer proteins. As will be appreciated 
by one of ordinary skill in the art, numerous other histological imaging techniques are useful 
in the invention. 

[143] In a preferred embodiment the label is detected in a fluorometer which 
has the ability to detect and distinguish emissions of different wavelengths. In addition, a 
fluorescence activated cell sorter (FACS) can be used in the method. 

[144] In another preferred embodiment, antibodies find use in diagnosing 
colorectal cancer from blood samples. As previously described, certain colorectal cancer 
proteins are secreted/circulating molecules. Blood samples, therefore, are useful as samples 
to be probed or tested for the presence of secreted colorectal cancer proteins. Antibodies can 
be used to detect the colorectal cancer by any of the previously described immunoassay 
techniques including ELISA, immunoblotting (Western blotting), immunoprecipitation, 
BIACORE technology and the like, as will be appreciated by one of ordinary skill in the art. 

[145] In a preferred embodiment, in situ hybridization of labeled colorectal 
cancer nucleic acid probes to tissue arrays is done. For example, arrays of tissue samples, 
including colorectal cancer tissue and/or normal tissue, are made. In situ hybridization as is 
known in the art can then be done. 

[146] It is understood that when comparing the fingerprints between an 
individual and a standard, the skilled artisan can make a diagnosis as well as a prognosis. It 
is further understood that the genes which indicate the diagnosis may differ from those which 
indicate the prognosis. 

[147] In a preferred embodiment, the colorectal cancer proteins, antibodies, 
nucleic acids, modified proteins and cells containing colorectal cancer sequences are used in 
prognosis assays. As above, gene expression profiles can be generated that correlate to 
colorectal cancer severity, in terms of long term prognosis. Again, this may be done on 
either a protein or gene level, with the use of genes being preferred. As above, the colorectal 
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cancer probes are attached to biochips for the detection and quantification of colorectal 
cancer sequences in a tissue or patient. The assays proceed as outlined for diagnosis. 

[148] In a preferred embodiment, any of the three classes of proteins as 
described herein are used in drug screening assays. The colorectal cancer proteins, 
antibodies, nucleic acids, modified proteins and cells containing colorectal cancer sequences 
are used in drug screening assays or by evaluating the effect of drug candidates on a "gene 
expression profile" or expression profile of polypeptides. In a preferred embodiment, the 
expression profiles are used, preferably in conjunction with high throughput screening 
techniques to allow monitoring for expression profile genes after treatment with a candidate 
agent, Zlokarnik, et al., Science 279, 84-8 (1998), Heid, 1996 #69. 

[149] In a preferred embodiment, the colorectal cancer proteins, antibodies, 
nucleic acids, modified proteins and cells containing the native or modified colorectal cancer 
proteins are used in screening assays. That is, the present invention provides novel methods 
for screening for compositions which modulate the colorectal cancer phenotype. As above, 
this can be done on an individual gene level or by evaluating the effect of drug candidates on 
a "gene expression profile". In a preferred embodiment, the expression profiles are used, 
preferably in conjunction with high throughput screening techniques to allow monitoring for 
expression profile genes after treatment with a candidate agent, see Zlokarnik, supra. 

[150] Having identified the differentially expressed genes herein, a variety 
of assays may be executed. In a preferred embodiment, assays may be run on an individual 
gene or protein level. That is, having identified a particular gene as up regulated in colorectal 
cancer , candidate bioactive agents may be screened to modulate this gene's response; 
preferably to down regulate the gene, although in some circumstances to up regulate the gene. 
"Modulation" thus includes both an increase and a decrease in gene expression. The 
preferred amount of modulation will depend on the original change of the gene expression in 
normal versus tumor tissue, with changes of at least 10%, preferably 50%, more preferably 
100-300%, and in some embodiments 300-1000% or greater. Thus, if a gene exhibits a 4 fold 
increase in tumor compared to normal tissue, a decrease of about four fold is desired; a 10 
fold decrease in tumor compared to normal tissue gives a 10 fold increase in expression for a 
candidate agent is desired. 

[151] As will be appreciated by those in the art, this may be done by 
evaluation at either the gene or the protein level; that is, the amount of gene expression may 
be monitored using nucleic acid probes and the quantification of gene expression levels, or, 
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alternatively, the gene product itself can be monitored, for example through the use of 
antibodies to the colorectal cancer protein and standard immunoassays. 

[152] In a preferred embodiment, gene expression monitoring is done and a 
number of genes, i.e. an expression profile, is monitored simultaneously, although multiple 
5 protein expression monitoring can be done as well. 

[153] In this embodiment, the colorectal cancer nucleic acid probes are 
attached to biochips as outlined herein for the detection and quantification of colorectal 
cancer sequences in a particular cell. The assays are further described below. 

[154] Generally, in a preferred embodiment, a candidate bioactive agent is 
10 added to the cells prior to analysis. Moreover, screens are provided to identify a candidate 

bioactive agent which modulates colorectal cancer, modulates colorectal cancer proteins, 
f ^ binds to a colorectal cancer protein, or interferes between the binding of a colorectal cancer 
*Q protein and an antibody. 

!y* [155] The term "candidate bioactive agent" or "drug candidate" or 

■!i;5 grammatical equivalents as used herein describes any molecule, e.g., protein, oligopeptide, 

Q 

fU small organic molecule, polysaccharide, polynucleotide, etc., to be tested for bioactive agents 
'S* that are capable of directly or indirectly altering either the colorectal cancer phenotype or the 

expression of a colorectal cancer sequence, including both nucleic acid sequences and 
M. protein sequences. In preferred embodiments, the bioactive agents modulate the expression 
t§p profiles, or expression profile nucleic acids or proteins provided herein. In a particularly 
H* preferred embodiment, the candidate agent suppresses a colorectal cancer phenotype, for 
example to a normal colon tissue fingerprint. Similarly, the candidate agent preferably 
suppresses a severe colorectal cancer phenotype. Generally a plurality of assay mixtures are 
run in parallel with different agent concentrations to obtain a differential response to the 
25 various concentrations. Typically, one of these concentrations serves as a negative control, 
i.e., at zero concentration or below the level of detection. 

[156] In one aspect, a candidate agent will neutralize the effect of a 
colorectal cancer protein. By "neutralize" is meant that activity of a protein is either 
inhibited or counter acted against so as to have substantially no effect on a cell. 
30 [157] Candidate agents encompass numerous chemical classes, though 

typically they are organic molecules, preferably small organic compounds having a molecular 
weight of more than 100 and less than about 2,500 daltons. Preferred small molecules are 
less than 2000, or less than 1500 or less than 1000 or less than 500 D. Candidate agents 
comprise functional groups necessary for structural interaction with proteins, particularly 
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hydrogen bonding, and typically include at least an amine, carbonyl, hydroxyl or carboxyl 
group, preferably at least two of the functional chemical groups. The candidate agents often 
comprise cyclical carbon or heterocyclic structures and/or aromatic or polyaromatic 
structures substituted with one or more of the above functional groups. Candidate agents are 
also found among biomolecules including peptides, saccharides, fatty acids, steroids, purines, 
pyrimidines, derivatives, structural analogs or combinations thereof. Particularly preferred 
are peptides. 

[158] Candidate agents are obtained from a wide variety of sources including 
libraries of synthetic or natural compounds. For example, numerous means are available for 
random and directed synthesis of a wide variety of organic compounds and biomolecules, 
including expression of randomized oligonucleotides. Alternatively, libraries of natural 
compounds in the form of bacterial, fungal, plant and animal extracts are available or readily 
produced. Additionally, natural or synthetically produced libraries and compounds are 
readily modified through conventional chemical, physical and biochemical means. Known 
pharmacological agents may be subjected to directed or random chemical modifications, such 
as acylation, alkylation, esterification, amidification to produce structural analogs. 

[159] In a preferred embodiment, the candidate bioactive agents are 
proteins. By "protein" herein is meant at least two covalently attached amino acids, which 
includes proteins, polypeptides, oligopeptides and peptides. The protein may be made up of 
naturally occurring amino acids and peptide bonds, or synthetic peptidomimetic structures. 
Thus "amino acid", or "peptide residue", as used herein means both naturally occurring and 
synthetic amino acids. For example, homo-phenylalanine, citrulline and noreleucine are 
considered amino acids for the purposes of the invention. "Amino acid" also includes imino 
acid residues such as proline and hydroxyproline. The side chains may be in either the (R) 
or the (S) configuration. In the preferred embodiment, the amino acids are in the (S) or L- 
configuration. If non-naturally occurring side chains are used, non-amino acid substituents 
may be used, for example to prevent or retard in vivo degradations. 

[160] In a preferred embodiment, the candidate bioactive agents are naturally 
occurring proteins or fragments of naturally occurring proteins. Thus, for example, cellular 
extracts containing proteins, or random or directed digests of proteinaceous cellular extracts, 
may be used. In this way libraries of procaryotic and eucaryotic proteins may be made for 
screening in the methods of the invention. Particularly preferred in this embodiment are 
libraries of bacterial, fungal, viral, and mammalian proteins, with the latter being preferred, 
and human proteins being especially preferred. 

44 



[161] In a preferred embodiment, the candidate bioactive agents are peptides 
of from about 5 to about 30 amino acids, with from about 5 to about 20 amino acids being 
preferred, and from about 7 to about 15 being particularly preferred. The peptides may be 
digests of naturally occurring proteins as is outlined above, random peptides, or "biased" 
random peptides. By "randomized" or grammatical equivalents herein is meant that each 
nucleic acid and peptide consists of essentially random nucleotides and amino acids, 
respectively. Since generally these random peptides (or nucleic acids, discussed below) are 
chemically synthesized, they may incorporate any nucleotide or amino acid at any position. 
The synthetic process can be designed to generate randomized proteins or nucleic acids, to 
allow the formation of all or most of the possible combinations over the length of the 
sequence, thus forming a library of randomized candidate bioactive proteinaceous agents. 

[162] In one embodiment, the library is fully randomized, with no sequence 
preferences or constants at any position. In a preferred embodiment, the library is biased. 
That is, some positions within the sequence are either held constant, or are selected from a 
limited number of possibilities. For example, in a preferred embodiment, the nucleotides or 
amino acid residues are randomized within a defined class, for example, of hydrophobic 
amino acids, hydrophilic residues, sterically biased (either small or large) residues, towards 
the creation of nucleic acid binding domains, the creation of cysteines, for cross-linking, 
prolines for SH-3 domains, serines, threonines, tyrosines or histidines for phosphorylation 
sites, etc., or to purines, etc. 

[163] In a preferred embodiment, the candidate bioactive agents are nucleic 
acids, as defined above. 

[164] As described above generally for proteins, nucleic acid candidate 
bioactive agents may be naturally occurring nucleic acids, random nucleic acids, or "biased" 
random nucleic acids. For example, digests of procaryotic or eucaryotic genomes may be 
used as is outlined above for proteins. 

[165] In a preferred embodiment, the candidate bioactive agents are organic 
chemical moieties, a wide variety of which are available in the literature. 

[166] After the candidate agent has been added and the cells allowed to 
incubate for some period of time, the sample containing the target sequences to be analyzed is 
added to the biochip. If required, the target sequence is prepared using known techniques. 
For example, the sample may be treated to lyse the cells, using known lysis buffers, 
electroporation, etc., with purification and/or amplification such as PCR occurring as needed, 
as will be appreciated by those in the art. For example, an in vitro transcription with labels 
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covalently attached to the nucleosides is done. Generally, the nucleic acids are labeled with 
biotin-FITC or PE, or with cy3 or cy5. 

[167] In a preferred embodiment, the target sequence is labeled with, for 
example, a fluorescent, a chemiluminescent, a chemical, or a radioactive signal, to provide a 
means of detecting the target sequence's specific binding to a probe. The label also can be an 
enzyme, such as, alkaline phosphatase or horseradish peroxidase, which when provided with 
an appropriate substrate produces a product that can be detected. Alternatively, the label can 
be a labeled compound or small molecule, such as an enzyme inhibitor, that binds but is not 
catalyzed or altered by the enzyme. The label also can be a moiety or compound, such as, an 
epitope tag or biotin which specifically binds to streptavidin. For the example of biotin, the 
streptavidin is labeled as described above, thereby, providing a detectable signal for the 
bound target sequence. As known in the art, unbound labeled streptavidin is removed prior to 
analysis. 

[168] As will be appreciated by those in the art, these assays can be direct 
hybridization assays or can comprise "sandwich assays", which include the use of multiple 
probes, as is generally outlined in U.S. Patent Nos. 5,681,702, 5,597,909, 5,545,730, 

5.594.117, 5,591,584, 5,571,670, 5,580,731, 5,571,670, 5,591,584, 5,624,802, 5,635,352, 

5.594.118, 5,359,100, 5,124,246 and 5,681,697, all of which are hereby incorporated by 
reference. In this embodiment, in general, the target nucleic acid is prepared as outlined 
above, and then added to the biochip comprising a plurality of nucleic acid probes, under 
conditions that allow the formation of a hybridization complex. 

[169] A variety of hybridization conditions may be used in the present 
invention, including high, moderate and low stringency conditions as outlined above. The 
assays are generally run under stringency conditions which allows formation of the label 
probe hybridization complex only in the presence of target. Stringency can be controlled by 
altering a step parameter that is a thermodynamic variable, including, but not limited to, 
temperature, formamide concentration, salt concentration, chaotropic salt concentration pH, 
organic solvent concentration, etc. 

[170] These parameters may also be used to control non-specific binding, as 
is generally outlined in U.S. Patent No. 5,681,697. Thus it may be desirable to perform 
certain steps at higher stringency conditions to reduce non-specific binding. 

[171] The reactions outlined herein may be accomplished in a variety of 
ways, as will be appreciated by those in the art. Components of the reaction may be added 
simultaneously, or sequentially, in any order, with preferred embodiments outlined below. In 
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addition, the reaction may include a variety of other reagents may be included in the assays. 
These include reagents like salts, buffers, neutral proteins, e.g. albumin, detergents, etc which 
may be used to facilitate optimal hybridization and detection, and/or reduce non-specific or 
background interactions. Also reagents that otherwise improve the efficiency of the assay, 
such as protease inhibitors, nuclease inhibitors, anti-microbial agents, etc., may be used, 
depending on the sample preparation methods and purity of the target. 

[172] Once the assay is run, the data is analyzed to determine the expression 
levels, and changes in expression levels as between states, of individual genes, forming a 
gene expression profile. 

[173] The screens are done to identify drugs or bioactive agents that 
modulate the colorectal cancer phenotype. Specifically, there are several types of screens 
that can be run. A preferred embodiment is in the screening of candidate agents that can 
induce or suppress a particular expression profile, thus preferably generating the associated 
phenotype. That is, candidate agents that can mimic or produce an expression profile in 
colorectal cancer similar to the expression profile of normal colon tissue is expected to result 
in a suppression of the colorectal cancer phenotype. Thus, in this embodiment, mimicking an 
expression profile, or changing one profile to another, is the goal. 

[174] In a preferred embodiment, as for the diagnosis and prognosis 
applications, having identified the differentially expressed genes important in any one state, 
screens can be run to alter the expression of the genes individually. That is, screening for 
modulation of regulation of expression of a single gene can be done; that is, rather than try to 
mimic all or part of an expression profile, screening for regulation of individual genes can be 
done. Thus, for example, particularly in the case of target genes whose presence or absence 
is unique between two states, screening is done for modulators of the target gene expression. 

[175] In a preferred embodiment, screening is done to alter the biological 
function of the expression product of the differentially expressed gene. Again, having 
identified the importance of a gene in a particular state, screening for agents that bind and/or 
modulate the biological activity of the gene product can be run as is more fully outlined 
below. 

[176] Thus, screening of candidate agents that modulate the colorectal cancer 
phenotype either at the gene expression level or the protein level can be done. 

[177] In addition screens can be done for novel genes that are induced in 
response to a candidate agent. After identifying a candidate agent based upon its ability to 
suppress a colorectal cancer expression pattern leading to a normal expression pattern, or 
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modulate a single colorectal cancer gene expression profile so as to mimic the expression of 
the gene from normal tissue, a screen as described above can be performed to identify genes 
that are specifically modulated in response to the agent. Comparing expression profiles 
between normal tissue and agent treated colorectal cancer tissue reveals genes that are not 
5 expressed in normal tissue or colorectal cancer tissue, but are expressed in agent treated 
tissue. These agent specific sequences can be identified and used by any of the methods 
described herein for colorectal cancer genes or proteins. In particular these sequences and 
the proteins they encode find use in marking or identifying agent treated cells. In addition, 
antibodies can be raised against the agent induced proteins and used to target novel 
10 therapeutics to the treated colorectal cancer tissue sample. 

[178] Thus, in one embodiment, a candidate agent is administered to a 
population of colorectal cancer cells, that thus has an associated colorectal cancer 
; v p expression profile. By "administration" or "contacting" herein is meant that the candidate 
s"/! agent is added to the cells in such a manner as to allow the agent to act upon the cell, whether 
Q 15 by uptake and intracellular action, or by action at the cell surface. In some embodiments, 
ff} nucleic acid encoding a proteinaceous candidate agent (i.e. a peptide) may be put into a viral 
c ^ construct such as a retroviral construct and added to the cell, such that expression of the 

peptide agent is accomplished; see PCT US97/01019, hereby expressly incorporated by 
reference. 

I 20 [179] Once the candidate agent has been administered to the cells, the cells 

b can be washed if desired and are allowed to incubate under preferably physiological 

conditions for some period of time. The cells are then harvested and a new gene expression 
profile is generated, as outlined herein. 

[180] Thus, for example, colorectal cancer tissue may be screened for 
25 agents that reduce or suppress the colorectal cancer phenotype. A change in at least one 
gene of the expression profile indicates that the agent has an effect on colorectal cancer 
activity. By defining such a signature for the colorectal cancer phenotype, screens for new 
drugs that alter the phenotype can be devised. With this approach, the drug target need not be 
known and need not be represented in the original expression screening platform, nor does 
30 the level of transcript for the target protein need to change. 

[181] In a preferred embodiment, as outlined above, screens may be done on 
individual genes and gene products (proteins). That is, having identified a particular 
differentially expressed gene as important in a particular state, screening of modulators of 
either the expression of the gene or the gene product itself can be done. The gene products of 
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differentially expressed genes are sometimes referred to herein as "colorectal cancer 
modulator proteins". The colorectal cancer modulator protein may be a fragment, or 
alternatively, be the full length protein to a fragment shown herein. Preferably, the colorectal 
cancer modulator protein is a fragment of approximately 14 to 24 amino acids long. More 
preferably the fragment is a soluble fragment. 

[182] In a preferred embodiment, the fragment is charged and from the c- 
terminus. In one embodiment, the c-terminus of the fragment is kept as a free acid and the n- 
terminus is a free amine to aid in coupling, i.e., to cysteine. In another embodiment, the 
fragment is an internal peptide overlapping hydrophilic stretch the protein. In a preferred 
embodiment, the termini is blocked. In another preferred embodiment, the fragment is a 
novel fragment from the N-terminal. In one embodiment, the fragment excludes sequence 
outside of the N-terminal, in another embodiment, the fragment includes at least a portion of 
the N-terminal. "N-terminal" is used interchangeably herein with "N-terminus" which is 
further described above. 

[183] In one embodiment the colorectal cancer proteins are conjugated to an 
immunogenic agent as discussed herein. In one embodiment the colorectal cancer protein is 
conjugated to BSA. 

[184] Thus, in a preferred embodiment, screening for modulators of 
expression of specific genes can be done. This will be done as outlined above, but in general 
the expression of only one or a few genes are evaluated. 

[185] In a preferred embodiment, screens are designed to first find candidate 
agents that can bind to differentially expressed proteins, and then these agents may be used in 
assays that evaluate the ability of the candidate agent to modulate differentially expressed 
activity. Thus, as will be appreciated by those in the art, there are a number of different 
assays which may be run; binding assays and activity assays. 

[186] In a preferred embodiment, binding assays are done. In general, 
purified or isolated gene product is used; that is, the gene products of one or more 
differentially expressed nucleic acids are made. In general, this is done as is known in the art. 
For example, antibodies are generated to the protein gene products, and standard 
immunoassays are run to determine the amount of protein present. Alternatively, cells 
comprising the colorectal cancer proteins can be used in the assays. 

[187] Thus, in a preferred embodiment, the methods comprise combining a 
colorectal cancer protein and a candidate bioactive agent, and determining the binding of the 
candidate agent to the colorectal cancer protein. Preferred embodiments utilize the human 
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colorectal cancer protein, although other mammalian proteins may also be used, for example 
for the development of animal models of human disease. In some embodiments, as outlined 
herein, variant or derivative colorectal cancer proteins may be used. 

[188] Generally, in a preferred embodiment of the methods herein, the 
colorectal cancer protein or the candidate agent is non-diffusably bound to an insoluble 
support having isolated sample receiving areas (e.g. a microtiter plate, an array, etc.). The 
insoluble supports may be made of any composition to which the compositions can be bound, 
is readily separated from soluble material, and is otherwise compatible with the overall 
method of screening. The surface of such supports may be solid or porous and of any 
convenient shape. Examples of suitable insoluble supports include microtiter plates, arrays, 
membranes and beads. These are typically made of glass, plastic (e.g., polystyrene), 
polysaccharides, nylon or nitrocellulose, teflon, etc. Microtiter plates and arrays are 
especially convenient because a large number of assays can be carried out simultaneously, 
using small amounts of reagents and samples. The particular manner of binding of the 
composition is not crucial so long as it is compatible with the reagents and overall methods of 
the invention, maintains the activity of the composition and is nondiffusable. Preferred 
methods of binding include the use of antibodies (which do not sterically block either the 
ligand binding site or activation sequence when the protein is bound to the support), direct 
binding to "sticky" or ionic supports, chemical crosslinking, the synthesis of the protein or 
agent on the surface, etc. Following binding of the protein or agent, excess unbound material 
is removed by washing. The sample receiving areas may then be blocked through incubation 
with bovine serum albumin (BS A), casein or other innocuous protein or other moiety. 

[189] In a preferred embodiment, the colorectal cancer protein is bound to 
the support, and a candidate bioactive agent is added to the assay. Alternatively, the 
candidate agent is bound to the support and the colorectal cancer protein is added. Novel 
binding agents include specific antibodies, non-natural binding agents identified in screens of 
chemical libraries, peptide analogs, etc. Of particular interest are screening assays for agents 
that have a low toxicity for human cells. A wide variety of assays may be used for this 
purpose, including labeled in vitro protein-protein binding assays, electrophoretic mobility 
shift assays, immunoassays for protein binding, functional assays (phosphorylation assays, 
etc.) and the like. 

[190] The determination of the binding of the candidate bioactive agent to 
the colorectal cancer protein may be done in a number of ways. In a preferred embodiment, 
the candidate bioactive agent is labeled, and binding determined directly. For example, this 
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may be done by attaching all or a portion of the colorectal cancer protein to a solid support, 
adding a labeled candidate agent (for example a fluorescent label), washing off excess 
reagent, and determining whether the label is present on the solid support. Various blocking 
and washing steps may be utilized as is known in the art. 

[191] By "labeled" herein is meant that the compound is either directly or 
indirectly labeled with a label which provides a detectable signal, e.g. radioisotope, 
fluorescers, enzyme, antibodies, particles such as magnetic particles, chemiluminescers, or 
specific binding molecules, etc. Specific binding molecules include pairs, such as biotin and 
streptavidin, digoxin and antidigoxin etc. For the specific binding members, the 
complementary member would normally be labeled with a molecule which provides for 
detection, in accordance with known procedures, as outlined above. The label can directly or 
indirectly provide a detectable signal. 

[192] In some embodiments, only one of the components is labeled. For 
example, the proteins (or proteinaceous candidate agents) may be labeled at tyrosine 
positions using 1251, or with fluorophores. Alternatively, more than one component may be 
labeled with different labels; using 125 I for the proteins, for example, and a fluorophor for the 
candidate agents. 

[193] In a preferred embodiment, the binding of the candidate bioactive 
agent is determined through the use of competitive binding assays. In this embodiment, the 
competitor is a binding moiety known to bind to the target molecule (i.e. colorectal cancer ), 
such as an antibody, peptide, binding partner, ligand, etc. Under certain circumstances, there 
may be competitive binding as between the bioactive agent and the binding moiety, wgh the 
binding moiety displacing the bioactive agent. 

[194] In one embodiment, the candidate bioactive agent is labeled. Either 
the candidate bioactive agent, or the competitor, or both, is added first to the protein for a 
time sufficient to allow binding, if present. Incubations may be performed at any 
temperature which facilitates optimal activity, typically between 4 and 40°C. Incubation 
periods are selected for optimum activity, but may also be optimized to facilitate rapid high 
through put screening. Typically between 0.1 and 1 hour will be sufficient. Excess reagent is 
generally removed or washed away. The second component is then added, and the presence 
or absence of the labeled component is followed, to indicate binding. 

[195] In a preferred embodiment, the competitor is added first, followed by 
the candidate bioactive agent. Displacement of the competitor is an indication that the 
candidate bioactive agent is binding to the colorectal cancer protein and thus is capable of 
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binding to, and potentially modulating, the activity of the colorectal cancer protein. In this 
embodiment, either component can be labeled. Thus, for example, if the competitor is 
labeled, the presence of label in the wash solution indicates displacement by the agent. 
Alternatively, if the candidate bioactive agent is labeled, the presence of the label on the 
5 support indicates displacement. 

[196] In an alternative embodiment, the candidate bioactive agent is added 
first, with incubation and washing, followed by the competitor. The absence of binding by 
the competitor may indicate that the bioactive agent is bound to the colorectal cancer protein 
with a higher affinity. Thus, if the candidate bioactive agent is labeled, the presence of the 
10 label on the support, coupled with a lack of competitor binding, may indicate that the 
candidate agent is capable of binding to the colorectal cancer protein. 

[197] In a preferred embodiment, the methods comprise differential 
screening to identity bioactive agents that are capable of modulating the activity of the 
j colorectal cancer proteins. In this embodiment, the methods comprise combining a 

15 colorectal cancer protein and a competitor in a first sample. A second sample comprises a 
candidate bioactive agent, a colorectal cancer protein and a competitor. The binding of the 
competitor is determined for both samples, and a change, or difference in binding between 
the two samples indicates the presence of an agent capable of binding to the colorectal 
cancer protein and potentially modulating its activity. That is, if the binding of the 
20 competitor is different in the second sample relative to the first sample, the agent is capable 
of binding to the colorectal cancer protein. 

[198] Alternatively, a preferred embodiment utilizes differential screening to 
identify drug candidates that bind to the native colorectal cancer protein, but cannot bind to 
modified colorectal cancer proteins. The structure of the colorectal cancer protein may be 
25 modeled, and used in rational drug design to synthesize agents that interact with that site. 
Drug candidates that affect colorectal cancer bioactivity are also identified by screening 
drugs for the ability to either enhance or reduce the activity of the protein. 

[199] Positive controls and negative controls may be used in the assays. 
Preferably all control and test samples are performed in at least triplicate to obtain 
30 statistically significant results. Incubation of all samples is for a time sufficient for the 

binding of the agent to the protein. Following incubation, all samples are washed free of non- 
specifically bound material and the amount of bound, generally labeled agent determined. 
For example, where a radiolabel is employed, the samples may be counted in a scintillation 
counter to determine the amount of bound compound. 



fi! 
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[200] A variety of other reagents may be included in the screening assays. 
These include reagents like salts, neutral proteins, e.g. albumin, detergents, etc which may be 
used to facilitate optimal protein-protein binding and/or reduce non-specific or background 
interactions. Also reagents that otherwise improve the efficiency of the assay, such as 
5 protease inhibitors, nuclease inhibitors, anti-microbial agents, etc., may be used. The mixture 
of components may be added in any order that provides for the requisite binding. 

[201] Screening for agents that modulate the activity of colorectal cancer 
proteins may also be done. In a preferred embodiment, methods for screening for a bioactive 
agent capable of modulating the activity of colorectal cancer proteins comprise the steps of 
10 adding a candidate bioactive agent to a sample of colorectal cancer proteins, as above, and 
determining an alteration in the biological activity of colorectal cancer proteins. 
* * "Modulating the activity of colorectal cancer " includes an increase in activity, a decrease in 

-,D activity, or a change in the type or kind of activity present. Thus, in this embodiment, the 

f ct i candidate agent should both bind to colorectal cancer proteins (although this may not be 

;Z 15 necessary), and alter its biological or biochemical activity as defined herein. The methods 
ITU include both in vitro screening methods, as are generally outlined above, and in vivo 

Q 

' screening of cells for alterations in the presence, distribution, activity or amount of colorectal 

cancer proteins. 

M [202] Thus, in this embodiment, the methods comprise combining a 

pa 

20 colorectal cancer sample and a candidate bioactive agent, and evaluating the effect on 
^ colorectal cancer activity. By "colorectal cancer activity" or grammatical equivalents herein 

is meant one of the colorectal cancer 's biological activities, including, but not limited to, cell 
division, preferably in colon tissue, cell proliferation, tumor growth, transformation of cells. 
In one embodiment, colorectal cancer activity includes activation of a gene identified by a 

25 nucleic acid of Table 1. An inhibitor of colorectal cancer activity is the inhibition of any one 
or more colorectal cancer activities. 

[203] In a preferred embodiment, the activity of the colorectal cancer protein 
is increased; in another preferred embodiment, the activity of the colorectal cancer protein is 
decreased. Thus, bioactive agents that are antagonists are preferred in some embodiments, 

30 and bioactive agents that are agonists may be preferred in other embodiments. 

[204] In a preferred embodiment, the invention provides methods for 
screening for bioactive agents capable of modulating the activity of a colorectal cancer 
protein. The methods comprise adding a candidate bioactive agent, as defined above, to a 
cell comprising colorectal cancer proteins. Preferred cell types include almost any cell. The 
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cells contain a recombinant nucleic acid that encodes a colorectal cancer protein. In a 
preferred embodiment, a library of candidate agents are tested on a plurality of cells. 

[205] In one aspect, the assays are evaluated in the presence or absence or 
previous or subsequent exposure of physiological signals, for example hormones, antibodies, 
5 peptides, antigens, cytokines, growth factors, action potentials, pharmacological agents 

including chemotherapeutics, radiation, carcinogenics, or other cells (i.e. cell-cell contacts). 
In another example, the determinations are determined at different stages of the cell cycle 
process. 

[206] In this way, bioactive agents are identified. Compounds with 
10 pharmacological activity are able to enhance or interfere with the activity of the colorectal 
cancer protein. In one embodiment, "colorectal cancer protein activity" as used herein 
includes at least one of the following: colorectal cancer activity, binding to the colorectal 
cancer protein, activation of the colorectal cancer protein or activation of substrates of the 
=f^ colorectal cancer protein by the colorectal cancer protein. In one embodiment, colorectal 

Q 15 cancer activity is defined as the unregulated proliferation of colon tissue, or the growth of 
fij cancer in colon tissue. In one aspect, colorectal cancer activity as defined herein is related to 

w the activity of the colorectal cancer protein in the upregulation of the colorectal cancer 

C3 protein in colon cancer tissue. 

[207] In another embodiment, colorectal cancer protein activity includes at 
20 least one of the following: colorectal cancer activity, binding to the CBF9 nucleic acid or 
H poly peptide of Table 2 or binding toa nucleic acid of Table 1, or a peptide encoded by a 

nucleic acid of Table 1 or activation of substrates of the gene products identified by a nucleic 
acid of Table 1 or substrates of CBF9, which is shown in Table 2. In one aspect, colorectal 
cancer activity as defined herein is related to the activity of genes defined by the nucleic acids 
25 of Table 1 or of CBF9 as defined in Table 2, in colon cancer tissue. 

[208] In one embodiment, a method of inhibiting colon cancer cell division is 
provided. The method comprises administration of a colorectal cancer inhibitor. 

[209] In another embodiment, a method of inhibiting tumor growth is 
provided. The method comprises administration of a colorectal cancer inhibitor. 
30 [210] In a further embodiment, methods of treating cells or individuals with 

cancer are provided. The method comprises administration of a colorectal cancer inhibitor. 

[211] In one embodiment, a colorectal cancer inhibitor is an antibody as 
discussed above. In another embodiment, the colorectal cancer inhibitor is an antisense 
molecule. Antisense molecules as used herein include antisense or sense oligonucleotides 
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comprising a singe-stranded nucleic acid sequence (either RNA or DNA) capable of binding 
to target mRNA (sense) or DNA (antisense) sequences for colorectal cancer molecules. A 
preferred antisense molecule is for the colorectal cancer sequences referenced in Table 1 or 
Table 2, or for a ligand or activator thereof. Antisense or sense oligonucleotides, according 
5 to the present invention, comprise a fragment generally at least about 14 nucleotides, 
preferably from about 14 to 30 nucleotides. The ability to derive an antisense or a sense 
oligonucleotide, based upon a cDNA sequence encoding a given protein is described in, for 
example, Stein and Cohen (Cancer Res. 48:2659, 1988) and van der Krol et al. 
(BioTechniques 6:958, 1988). 
10 [212] Antisense molecules may be introduced into a cell containing the target 

nucleotide sequence by formation of a conjugate with a ligand binding molecule, as described 
in WO 91/04753. Suitable ligand binding molecules include, but are not limited to, cell 
surface receptors, growth factors, other cytokines, or other ligands that bind to cell surface 
Q receptors. Preferably, conjugation of the ligand binding molecule does not substantially 

2 15 interfere with the ability of the ligand binding molecule to bind to its corresponding molecule 
or receptor, or block entry of the sense or antisense oligonucleotide or its conjugated version 
into the cell. Alternatively, a sense or an antisense oligonucleotide may be introduced into a 
Q cell containing the target nucleic acid sequence by formation of an oligonucleotide-lipid 

M complex, as described in WO 90/10448. It is understood that the use of antisense molecules 

IZ 20 or knock out and knock in models may also be used in screening assays as discussed above, 
M in addition to methods of treatment. 

[213] The compounds having the desired pharmacological activity may be 
administered in a physiologically acceptable carrier to a host, as previously described. The 
agents may be administered in a variety of ways, orally, parenterally e.g., subcutaneously, 
25 intraperitoneally, intravascularly, etc. Depending upon the manner of introduction, the 
compounds may be formulated in a variety of ways. The concentration of therapeutically 
active compound in the formulation may vary from about 0.1-100 wt.%. The agents may be 
administered alone or in combination with other treatments, i.e., radiation. 

[214] The pharmaceutical compositions can be prepared in various forms, 
30 such as granules, tablets, pills, suppositories, capsules, suspensions, salves, lotions and the 
like. Pharmaceutical grade organic or inorganic carriers and/or diluents suitable for oral and 
topical use can be used to make up compositions containing the therapeutically-active 
compounds. Diluents known to the art include aqueous media, vegetable and animal oils and 
fats. Stabilizing agents, wetting and emulsifying agents, salts for varying the osmotic 
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pressure or buffers for securing an adequate pH value, and skin penetration enhancers can be 
used as auxiliary agents. 

[215] Without being bound by theory, it appears that the various colorectal 
cancer sequences are important in colorectal cancer . Accordingly, disorders based on 
mutant or variant colorectal cancer genes may be determined. In one embodiment, the 
invention provides methods for identifying cells containing variant colorectal cancer genes 
comprising determining all or part of the sequence of at least one endogeneous colorectal 
cancer genes in a cell. As will be appreciated by those in the art, this may be done using any 
number of sequencing techniques. In a preferred embodiment, the invention provides 
methods of identifying the colorectal cancer genotype of an individual comprising 
determining all or part of the sequence of at least one colorectal cancer gene of the 
individual. This is generally done in at least one tissue of the individual, and may include the 
evaluation of a number of tissues or different samples of the same tissue. The method may 
include comparing the sequence of the sequenced colorectal cancer gene to a known 
colorectal cancer gene, i.e. a wild-type gene. 

[216] The sequence of all or part of the colorectal cancer gene can then be 
compared to the sequence of a known colorectal cancer gene to determine if any differences 
exist. This can be done using any number of known homology programs, such as Bestfit, etc. 
In a preferred embodiment, the presence of a a difference in the sequence between the 
colorectal cancer gene of the patient and the known colorectal cancer gene is indicative of a 
disease state or a propensity for a disease state, as outlined herein. 

[217] 

[218] In a preferred embodiment, the colorectal cancer genes are used as 
probes to determine the number of copies of the colorectal cancer gene in the genome. 

[219] In another preferred embodiment colorectal cancer genes are used as 
probed to determine the chromosomal localization of the colorectal cancer genes. 
Information such as chromosomal localization finds use in providing a diagnosis or prognosis 
in particular when chromosomal abnormalities such as translocations, and the like are 
identified in colorectal cancer gene loci. 

[220] Thus, in one embodiment, methods of modulating colorectal cancer in 
cells or organisms are provided. In one embodiment, the methods comprise administering to 
a cell an anti-colorectal cancer antibody that reduces or eliminates the biological activity of 
an endogeneous colorectal cancer protein. Alternatively, the methods comprise 
administering to a cell or organism a recombinant nucleic acid encoding a colorectal cancer 
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protein. As will be appreciated by those in the art, this may be accomplished in any number 
of ways. In a preferred embodiment, for example when the colorectal cancer sequence is 
down-regulated in colorectal cancer , the activity of the colorectal cancer gene is increased 
by increasing the amount of colorectal cancer in the cell, for example by overexpressing the 
5 endogeneous colorectal cancer or by administering a gene encoding the colorectal cancer 
sequence, using known gene-therapy techniques, for example. In a preferred embodiment, 
the gene therapy techniques include the incorporation of the erogenous gene using enhanced 
homologous recombination (EHR), for example as described in PCT/US93/03868, hereby 
incorporated by reference in its entirety. Alternatively, for example when the colorectal 
10 cancer sequence is up-regulated in colorectal cancer , the activity of the endogeneous 

colorectal cancer gene is decreased, for example by the administration of a colorectal cancer 
antisense nucleic acid. 



invention may be used to generate polyclonal and monoclonal antibodies to colorectal cancer 



Q 15 proteins, which are useful as described herein. Similarly, the colorectal cancer proteins can 

r y be coupled, using standard technology, to affinity chromatography columns. These columns 

^ may then be used to purify colorectal cancer antibodies. In a preferred embodiment, the 

P antibodies are generated to epitopes unique to a colorectal cancer protein; that is, the 

CO 

antibodies show little or no cross-reactivity to other proteins. These antibodies find use in a 

20 number of applications. For example, the colorectal cancer antibodies may be coupled to 



standard affinity chromatography columns and used to purify colorectal cancer proteins. The 
antibodies may also be used as blocking polypeptides, as outlined above, since they will 
specifically bind to the colorectal cancer protein. 



25 cancer or modulator thereof is administered to a patient. By "therapeutically effective dose" 
herein is meant a dose that produces the effects for which it is administered. The exact dose 
will depend on the purpose of the treatment, and will be ascertainable by one skilled in the art 
using known techniques. As is known in the art, adjustments for colorectal cancer 
degradation, systemic versus localized delivery, and rate of new protease synthesis, as well as 

30 the age, body weight, general health, sex, diet, time of administration, drug interaction and 
the severity of the condition may be necessary, and will be ascertainable with routine 
experimentation by those skilled in the art. 



[221] In one embodiment, the colorectal cancer proteins of the present 



[222] In one embodiment, a therapeutically effective dose of a colorectal 



[223] A "patient" for the purposes of the present invention includes both 
humans and other animals, particularly mammals, and organisms. Thus the methods are 
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applicable to both human therapy and veterinary applications. In the preferred embodiment 
the patient is a mammal, and in the most preferred embodiment the patient is human. 

[224] The administration of the colorectal cancer proteins and modulators 
of the present invention can be done in a variety of ways as discussed above, including, but 
5 not limited to, orally, subcutaneously, intravenously, intranasally, transdermally, 

intraperitoneally, intramuscularly, intrapulmonary, vaginally, rectally, or intraocularly. In 
some instances, for example, in the treatment of wounds and inflammation, the colorectal 
cancer proteins and modulators may be directly applied as a solution or spray. 

[225] The pharmaceutical compositions of the present invention comprise a 
10 colorectal cancer protein in a form suitable for administration to a patient. In the preferred 
embodiment, the pharmaceutical compositions are in a water soluble form, such as being 
present as pharmaceutically acceptable salts, which is meant to include both acid and base 
*p addition salts. "Pharmaceutically acceptable acid addition salt" refers to those salts that retain 

l*r! the biological effectiveness of the free bases and that are not biologically or otherwise 

J5 15 undesirable, formed with inorganic acids such as hydrochloric acid, hydrobromic acid, 
flj sulfuric acid, nitric acid, phosphoric acid and the like, and organic acids such as acetic acid, 

^ propionic acid, glycolic acid, pyruvic acid, oxalic acid, maleic acid, malonic acid, succinic 

?3 acid, fumaric acid, tartaric acid, citric acid, benzoic acid, cinnamic acid, mandelic acid, 

methanesulfonic acid, ethanesulfonic acid, p-toluenesulfonic acid, salicylic acid and the like. 
20 "Pharmaceutically acceptable base addition salts" include those derived from inorganic bases 
m such as sodium, potassium, lithium, ammonium, calcium, magnesium, iron, zinc, copper, 

manganese, aluminum salts and the like. Particularly preferred are the ammonium, 
potassium, sodium, calcium, and magnesium salts. Salts derived from pharmaceutically 
acceptable organic non-toxic bases include salts of primary, secondary, and tertiary amines, 
25 substituted amines including naturally occurring substituted amines, cyclic amines and basic 
ion exchange resins, such as isopropylamine, trimethylamine, diethylamine, triethylamine, 
tripropylamine, and ethanolamine. 

[226] The pharmaceutical compositions may also include one or more of the 
following: carrier proteins such as serum albumin; buffers; fillers such as microcrystalline 
30 cellulose, lactose, corn and other starches; binding agents; sweeteners and other flavoring 
agents; coloring agents; and polyethylene glycol. Additives are well known in the art, and 
are used in a variety of formulations. 

[227] In a preferred embodiment, colorectal cancer proteins and modulators 
are administered as therapeutic agents, and can be formulated as outlined above. Similarly, 
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colorectal cancer genes (including both the full-length sequence, partial sequences, or 
regulatory sequences of the colorectal cancer coding regions) can be administered in gene 
therapy applications, as is known in the art. These colorectal cancer genes can include 
antisense applications, either as gene therapy (i.e. for incorporation into the genome) or as 
antisense compositions, as will be appreciated by those in the art. 

[228] In a preferred embodiment, colorectal cancer genes are administered 
as DNA vaccines, either single genes or combinations of colorectal cancer genes. Naked 
DNA vaccines are generally known in the art. Brower, Nature Biotechnology, 16:1304-1305 
(1998). 

[229] In one embodiment, colorectal cancer genes of the present invention 
are used as DNA vaccines. Methods for the use of genes as DNA vaccines are well known to 
one of ordinary skill in the art, and include placing a colorectal cancer gene or portion of a 
colorectal cancer gene under the control of a promoter for expression in a colorectal cancer 
patient. The colorectal cancer gene used for DNA vaccines can encode full-length colorectal 
cancer proteins, but more preferably encodes portions of the colorectal cancer proteins 
including peptides derived from the colorectal cancer protein. In a preferred embodiment a 
patient is immunized with a DNA vaccine comprising a plurality of nucleotide sequences 
derived from a colorectal cancer gene. Similarly, it is possible to immunize a patient with a 
plurality of colorectal cancer genes or portions thereof as defined herein. Without being 
bound by theory, expression of the polypeptide encoded by the DNA vaccine, cytotoxic T- 
cells, helper T-cells and antibodies are induced which recognize and destroy or eliminate 
cells expressing colorectal cancer proteins. 

[230] In a preferred embodiment, the DNA vaccines include a gene encoding 
an adjuvant molecule with the DNA vaccine. Such adjuvant molecules include cytokines that 
increase the immunogenic response to the colorectal cancer polypeptide encoded by the 
DNA vaccine. Additional or alternative adjuvants are known to those of ordinary skill in the 
art and find use in the invention. 

[231] In another preferred embodiment colorectal cancer genes find use in 
generating animal models of colorectal cancer . As is appreciated by one of ordinary skill in 
the art, when the colorectal cancer gene identified is repressed or diminished in colorectal 
cancer tissue, gene therapy technology wherein antisense RNA directed to the colorectal 
cancer gene will also diminish or repress expression of the gene. An animal generated as 
such serves as an animal model of colorectal cancer that finds use in screening bioactive 
drug candidates. Similarly, gene knockout technology, for example as a result of 
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homologous recombination with an appropriate gene targeting vector, will result in the 
absence of the colorectal cancer protein. When desired, tissue-specific expression or 
knockout of the colorectal cancer protein may be necessary. 

[232] It is also possible that the colorectal cancer protein is overexpressed in 
5 colorectal cancer . As such, transgenic animals can be generated that overexpress the 

colorectal cancer protein. Depending on the desired expression level, promoters of various 
strengths can be employed to express the transgene. Also, the number of copies of the 
integrated transgene can be determined and compared for a determination of the expression 
level of the transgene. Animals generated by such methods find use as animal models of 
10 colorectal cancer and are additionally useful in screening for bioactive molecules to treat 
colorectal cancer . 

4 EXAMPLES 

[233] It is understood that the examples described herein in no way serve to 
15 limit the true scope of this invention, but rather are presented for illustrative purposes. All 
references and sequences of accession numbers cited herein are incorporated by reference in 
their entirety. 

[234] Example 1 

Tissue Preparation, Labeling Chips, and Fingerprints 



w 



6 



C 20 

[235] Purify total RNA from tissue using TRIzol Reagent 
[236] Estimate tissue weight. Homogenize tissue samples in 1ml of TRIzol 
per 50mg of tissue using a Polytron 3100 homogenizer. The generator/probe used depends 
upon the tissue size. A generator that is too large for the amount of tissue to be homogenized 
25 will cause a loss of sample and lower RNA yield. Use the 20mm generator for tissue 

weighing more than 0.6g. If the working volume is greater than 2ml, then homogenize tissue 
in a 15ml polypropylene tube (Falcon 2059). Fill tube no greater than 10ml. 



30 HOMOGENIZATION 

[237] Before using generator, it should have been cleaned after last usage by 
running it through soapy H20 and rinsing thoroughly. Run through with EtOH to sterilize. 
Keep tissue frozen until ready. Add TRIzol directly to frozen tissue then homogenize. 
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[238] Following homogenization, remove insoluble material from the 
homogenate by centrifugation at 7500 x g for 15 min. in a Sorvall superspeed or 12,000 x g 
for 10 min. in an Eppendorf centrifuge at 4oC. Transfer the cleared homogenate to a new 
tube(s). The samples may be frozen now at -60 to -70oC (and kept for at least one month) or 
5 you may continue with the purification. 

PHASE SEPARATION 
[239] Incubate the homogenized samples for 5 minutes at room temperature. 
[240] Add 0.2ml of chloroform per 1ml of TRIzol reagent used in the 
10 original homogenization. 

[241] Cap tubes securely and shake tubes vigorously by hand (do not vortex) 

for 15 seconds. 

%D [242] Incubate samples at room temp, for 2-3 minutes. Centrifuge samples 

Q at 6500rpm in a Sorvall superspeed for 30 min. at 4oC. (You may spin at up to 12,000 x g 
15 for 10 min. but you risk breaking your tubes in the centrifuge.) 

ril 

! 3 RNA PRECIPITATION 

?3 [243] Transfer the aqueous phase to a fresh tube. Save the organic phase if 

CO 

M isolation of DNA or protein is desired. Add 0.5ml of isopropyl alcohol per 1ml of TRIzol 
*?Z 20 reagent used in the original homogenization. Cap tubes securely and invert to mix. Incubate 
H samples at room temp, for 10 minutes. Centrifuge samples at 6500rpm in Sorvall for 20min. 
at 4oC. 

RNA WASH 

25 [244] Pour off the supernate. Wash pellet with cold 75% ethanol. Use 1ml 

of 75% ethanol per 1ml of TRIzol reagent used in the initial homogenization. Cap tubes 
securely and invert several times to loosen pellet. (Do not vortex). Centrifuge at <8000rpm 
(<7500 x g) for 5 minutes at 4oC. 

[245] Pour off the wash. Carefully transfer pellet to an eppendorf tube (let it 

30 slide down the tube into the new tube and use a pipet tip to help guide it in if necessary). 

Depending on the volumes you are working with, you can decide what size tube(s) you want 
to precipitate the RNA in. When I tried leaving the RNA in the large 15ml tube, it took so 
long to dry (i.e. it did not dry) that I eventually had to transfer it to a smaller tube. Let pellet 
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dry in hood. Resuspend RNA in an appropriate volume of DEPC H20. Try for 2-5ug/ul. 
Take absorbance readings. 

[246] Purify poly A+ mRNA from total RNA or clean up total RNA with 
5 Qiagen' s RNeasy kit 

[247] Purification of poly A+ mRNA from total RNA. Heat oligotex 
suspension to 37oC and mix immediately before adding to RNA. Incubate Elution Buffer at 
70oC. Warm up 2 x Binding Buffer at 65oC if there is precipitate in the buffer. Mix total 
RNA with DEPC-treated water, 2 x Binding Buffer, and Oligotex according to Table 2 on 
10 page 16 of the Oligotex Handbook. Incubate for 3 minutes at 65oC. Incubate for 10 minutes 
at room temperature. 



VXlg 



[248] Centrifuge for 2 minutes at 14,000 to 18,000 g. If centrifuge has a 
W "soft setting," then use it. Remove supernatant without disturbing Oligotex pellet. A little bit 
5 15 of solution can be left behind to reduce the loss of Oligotex. Save sup until certain that 
satisfactory binding and elution of poly A+ mRNA has occurred. 



JfJ [249] Gently resuspend in Wash Buffer OW2 and pipet onto spin column. 

Centrifuge the spin column at full speed (soft setting if possible) for 1 minute. 

520 

ir ~" [250] Transfer spin column to a new collection tube and gently resuspend in 

Wash Buffer OW2 and centrifuge as describe herein. 

[251] Transfer spin column to a new tube and elute with 20 to 100 ul of 
25 preheated (70oC) Elution Buffer. Gently resuspend Oligotex resin by pipetting up and down. 
Centrifuge as above. Repeat elution with fresh elution buffer or use first eluate to keep the 
elution volume low. 



[252] Read absorbance, using diluted Elution Buffer as the blank. 

30 

[253] Before proceeding with cDNA synthesis, the mRNA must be 
precipitated. Some component leftover or in the Elution Buffer from the Oligotex 
purification procedure will inhibit downstream enzymatic reactions of the mRNA. 
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Ethanol Precipitation 
[254] Add 0.4 vol. of 7.5 M NH40Ac + 2.5 vol. of cold 100% ethanol. 
Precipitate at -20oC 1 hour to overnight (or 20-30 min. at -70oC). Centrifuge at 14,000- 
16,000 x g for 30 minutes at 4oC. Wash pellet with 0.5ml of 80%ethanol (-20oC) then 
5 centrifuge at 14,000-16,000 x g for 5 minutes at room temperature. Repeat 80% ethanol 
wash. Dry the last bit of ethanol from the pellet in the hood. (Do not speed vacuum). 
Suspend pellet in DEPC H20 at lug/ul concentration. 

Clean up total RNA using Qiagen's RNeasy kit 
10 [255] Add no more than lOOug to an RNeasy column. Adjust sample to a 

volume of lOOul with RNase-free water. Add 350ul Buffer RLT then 250ul ethanol (100%) 
to the sample. Mix by pipetting (do not centrifuge) then apply sample to an RNeasy mini 
spin column. Centrifuge for 15 sec at >10,000rpm. If concerned about yield, re-apply 
flowthrough to column and centrifuge again. 



H 15 [256] Transfer column to a new 2-ml collection tube. Add 500ul Buffer RPE 



'CSV 



and centrifuge for 15 sec at >10,000rpm. Discard flowthrough. Add 500ul Buffer RPE and 
centrifuge for 15 sec at >10,000rpm. Discard flowthrough then centrifuge for 2 min at 
q maximum speed to dry column membrane. Transfer column to a new 1.5-ml collection tube 

* and apply 30-50ul of RNase-free water directly onto column membrane. Centrifuge 1 min at 

3 20 >10,000rpm. Repeat elution. 

[257] Take absorbance reading. If necessary, ethanol precipitate with 
ammonium acetate and 2.5X volume 100% ethanol. 

[258] Make cDNA using Gibco's "Superscript Choice System for cDNA 

25 Synthesis" kit 

First Strand cDNA Synthesis 
[259] Use 5ug of total RNA or lug of polyA+ mRNA as starting material. 
For total RNA, use 2ul of Superscript RT. For polyA+ mRNA, use lul of Superscript RT. 
Final volume of first strand synthesis mix is 20ul. RNA must be in a volume no greater than 
30 lOul. Incubate RNA with lul of lOOpmol T7-T24 oligo for 10 min at 70C. On ice, add 7 ul 
of: 4ul 5X 1st Strand Buffer, 2ul of 0.1M DTT, and 1 ul of lOmM dNTP mix. Incubate at 
37C for 2 min then add Superscript RT 

Incubate at 37C for 1 hour. 
Second Strand Synthesis 
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Place 1st strand reactions on ice. 

Add: 91ulDEPCH20 

30ul 5X 2nd Strand Buffer 

3ul lOmM dNTP mix 

lul lOU/ul E.coli DNA Ligase 

4ul lOU/ul E.coli DNA Polymerase 

lul 2U/ul RNaseH 



[260] Make the above into a mix if there are more than 2 samples. Mix and 
incubate 2 hours at 16C. 



0.5M EDTA 



[261] Add 2ul T4 DNA Polymerase. Incubate 5 min at 16C. Add lOul of 



[262] Clean up cDNA 

[263] Phenol:Chloroform:Isoamyl Alcohol (25:24:1) purification using 
Phase-Lock gel tubes: 

[264] Centrifuge PLG tubes for 30 sec at maximum speed. Transfer cDNA 
mix to PLG tube. Add equal volume of phenol.chloroform.isamyl alcohol and shake 
vigorously (do not vortex). Centrifuge 5 minutes at maximum speed. Transfer top aqueous 
solution to a new tube. Ethanol precipitate: add 7.5X 5M NH40ac and 2.5X volume of 
100% ethanol. Centrifuge immediately at room temp, for 20 min, maximum speed. Remove 
sup then wash pellet 2X with cold 80% ethanol. Remove as much ethanol wash as possible 
then let pellet air dry. Resuspend pellet in 3ul RNase-free water. 

In vitro Transcription (IVT) and labeling with biotin 
Pipet 1.5ul of cDNA into a thin-wall PCR tube. 

Make NTP labeling mix: 

Combine at room temperature: 2ul T7 lOxATP (75mM) (Ambion) 
2ul T7 1 OxGTP (75mM) (Ambion) 
1 .5ul T7 lOxCTP (75mM) (Ambion) 
1.5ul T7 lOxUTP (75mM) (Ambion) 

3.75ul lOmM Bio-1 1-UTP (Boehringer-Mannheim/Roche or Enzo) 
3.75ul lOmM Bio-16-CTP (Enzo) 
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2ul lOx T7 transcription buffer (Ambion) 
2ul lOx T7 enzyme mix (Ambion) 

[265] Final volume of total reaction is 20ul. Incubate 6 hours at 37C in a 

5 PCR machine. 

RNeasy clean-up of IVT product 
[266] Follow previous instructions for RNeasy columns or refer to Qiagen's 
RNeasy protocol handbook. 
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[267] cRNA will most likely need to be ethahol precipitated. Resuspend in 
a volume compatible with the fragmentation step. 



•U Fragmentation 

3 15 [268] 15 ug of labeled RNA is usually fragmented. Try to minimize the 

5 fragmentation reaction volume; a 10 ul volume is recommended but 20 ul is all right. Do not 
go higher than 20 ul because the magnesium in the fragmentation buffer contributes to 
precipitation in the hybridization buffer. 
^ [269] Fragment RNA by incubation at 94 C for 35 minutes in 1 x 

Q 20 Fragmentation buffer. 

5 x Fragmentation buffer: 
200 mM Tris-acetate, pH 8.1 
500 mM KOAc 
25 150mMMgOAc 

[270] The labeled RNA transcript can be analyzed before and after 
fragmentation. Samples can be heated to 65C for 15 minutes and electrophoresed on 1% 
agarose/TBE gels to get an approximate idea of the transcript size range 



Hybridization 

[271] 200 ul (lOug cRNA) of a hybridization mix is put on the chip. If 
multiple hybridizations are to be done (such as cycling through a 5 chip set), then it is 
recommended that an initial hybridization mix of 300 ul or more be made. 
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Hybrization Mix: fragment labeled RNA (50ng/ul final cone.) 
50 pM 948-b control oligo 
1.5pMBioB 
5 5 pM BioC 

25 pM BioD 
100 pM CRE 

O.lmg/ml herring sperm DNA 
0.5mg/ml acetylated BSA 
10 to 300 ul with lxMES hyb. buffer 

C3 [272] The instruction manuals for the products used herein are incorporated 

herein in their entirety. 

W 

Q 1 5 Labeling Protocol Provided Herein 

Hybridization reaction: 
^ Start with non-biotinylated IVT (purified by RNeasy columns) 

Iq (see example 1 for steps from tissue to IVT) 

f Z IVT antisense RNA; 4 jig: jal 

Q 20 Random Hexamers (1 ug/ul): 4 

H20: ul 
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14 ul 

25 - Incubate 70°C, 10 min. Put on ice. 



Reverse transcription: 




5X First Strand (BRL) buffer: 


6ul 


0.1MDTT: 


3*xl 


50X dNTP mix: 


0.6 ul 


H20: 


2.4 ul 


Cy3 orCy5 dUTP (ImM): 


3 M l 


SS RT II (BRL): 


lul 




16 ul 
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- Add to hybridization reaction. 

- Incubate 30 min., 42°C. 

- Add 1 jil SSII and let go for another hour. 
Put on ice. 

5 - SOX dNTP mix (25mM of cold dATP, dCTP, and dGTP, lOmM of dTTP: 25 

^ each of lOOmM dATP, dCTP, and dGTP; 10 jxl of lOOmM dTTP to 15 jal H20. dNTPs 
from Pharmacia) 

RNA degradation: 
86 nl H20 

- Add 1.5 fil 1M NaOH/ 2mM EDTA, incubate at 65°C, 10 min. 
10 ni ION NaOH 
4 nl 50mM EDTA 
U-Con 30 

500 TE/sample spin at 7000g for 10 min, save flow through for purification 
Oiagen purification: 

-suspend u-con recovered material in 500|il buffer PB 
-proceed w/ normal Qiagen protocol 
;q 20 DNAse digest: 

M - Add 1 \il of 1/100 dil of DNAse/30^1 Rx and incubate at 37°C for 15 min. 

-5 min 95°C to denature enzyme 

Sample preparation: 

25 -Add: 

CoM DNA: 10 |^1 
SOX dNTPs: 1 \il 
Na pyro phosphate: 7.5 jj.1 

lOmg/ml Herring sperm DNA lul of 1/10 dilution 
30 21.8 final vol. 

- Dry down in speed vac. 

- Resuspend in 15 jil H20. 
-Add 0.38^1 10%SDS. 

- Heat 95°C, 2 min. 
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a 15 
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- Slow cool at room temp, for 20 min. 

Put on slide and hybridize overnight at 64°C. 



Washine after the hybridization: 

3X SSC/0.03% SDS: 2 min. 37.5 ml 20X SSC+0.75ml 10%'SDS in 



250ml H20 



IX SSC: 5 min. 12.5 ml 20X SSC in 250ml H20 

0.2X SSC: 5 min. 2.5 ml 20X SSC in 250ml H20 

10 Dry slides in centrifuge, 1000 RPM, lmin. 

[273] Scan using appropriate Photomultiplier tube (PMT) and fluorescent 
excitation and emission channels. 

[274] The results are shown in Table 1 and Table 2. The lists of genes come 
from colorectal tumors from a variety of stages of the disease. The genes that are up 
15 regulated in the tumors (overall) were also found to be expressed at a limited amount or not at 
jfy all in the body map. The body map consists of at least 28 tissue types, including Adrenal 
"I s * Gland, Bladder, Bone Marrow, Brain, Breast, Cervix, Colon, Diaphragm, Heart, Kidney, 

Q Liver, Lung, Lymph Node, Muscle, Pancreas, Prostate, Rectum, Salivary Gland, Skin, Small 

fn 

Intestine, Spinal Cord, Spleen, Stomach, Testis, Thymus, Thyroid Trachea and Uterus. As 
*jj 20 indicated, some of the Accession numbers include expression sequence tags (ESTs). Thus, in 
M= one embodiment herein, genes within an expression profile, also termed expression profile 
genes, include ESTs and are not necessarily full length. 

[275] Table 1 shows Accession numbers for 1747 genes upregulated in colon 
tumor tissue. The table provides the exemplar accession numbers, Unigene ID numbers, 
25 unique Eos codes, descriptions of the genes encoded, and relative amount of expression as 
compared with expression in other normal body tissue. 

TABLE 1. GENES INVOLVED IN COLORECTAL CANCER 

30 PKey Primekey(unique probeset identifier) 

Ex. Accn. Exemplar accession number 

Probeset Eos Code number 

Unigene# Unigene number 



Pkev Probeset Ex Accn UniG ID UniGene Title Ratio TumMet/Bodv 



332264 EOS32195 N72849 Hs.115263 eptregulin 17.6 

332716 EOS32647 L00058 Hs.79070 v-myc avian myeJocytomatosis viral oncogene homolog 15.0 

40 312845 EOS12776 AI9M215 Hs. 186555 ESTs 14.3 

310257 EOS10188 AW389247 Hs.146826 ESTs 11.6 
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322567 
331060 
322303 
301891 
318524 
314001 
331183 
315429 
303344 
313625 
307084 
314943 
303753 
315593 
313604 
312319 
312614 
323176 
317916 
301846 
311157 
332640 
311728 
313774 
312339 
315369 
303756 
301050 
300319 
300664 
302655 
315175 
330766 
310875 
313425 
301804 
332203 
322968 
321524 
302476 
303295 
310016 
324871 
322887 
313171 
321638 
320445 
302149 
316905 
313166 
323338 
311434 
312742 
323587 
317390 
315282 
318565 
307586 
321052 
324338 
307517 
314852 
324657 
314912 
324790 
315498 
312857 
300762 
325587 

320654 
316715 
333279 

309669 
323846 
324676 
308362 
308615 
315397 
302236 
321693 
330814 
302977 
327516 



EOS22498 
EOS30991 
EOS22234 
EOS01822 
EOS18455 
EOS13932 
EOS31114 
EOS15360 
EOS03275 
E0S13556 
EOS07015 
EOS14874 
EOS03684 
E0S15524 
EOS13535 
EOS12250 
EOS12545 
EOS23107 
EOS17847 
EOS01777 
EOS11088 
EOS32571 
EOS11659 
EOS13705 
EOS12270 
EOS15300 
EOS03687 
EOS00981 
EOS00250 
EOS00595 
EOS02586 
EOS15106 
EOS30717 
EOS10806 
EOS13356 
EOS01735 
EOS32134 
EOS22899 
EOS21455 
EOS02407 
EOS03226 
EOS09947 
EOS24802 
EOS22818 
EOS13102 
EOS21569 
EOS20376 
EOS02080 
EOS16836 
EOS13097 
EOS23269 
EOS 11 365 
EOS12673 
EOS23518 
EOS17321 
EOS15213 
EOS18496 
EOS07517 
EOS20983 
EOS24269 
EOS07448 
EOS14783 
EOS24588 
EOS14843 
EOS24721 
EOS15429 
EOS12788 
EOS00693 
EOS25518 

EOS20585 
EOS16646 
EOS33210 

EOS09620 
EOS23777 
EOS24609 
EOS08293 
EOS08546 
EOS15328 
EOS02167 
EOS21624 
EOS30745 
EOSQ2908 
EOS27447 



EST cluster (not in UniGene) 
ESTs 

EST cluster (not in UniGene) 
Homo sapiens done 25056 mRNA sequence 
ESTs 
ESTs 
EST 
ESTs 

ESTs; Highly similar to ubiquitin-conjugating enzyme [M^nusculus] 
ESTs 

EST singleton (not in UniGene) with exon hit 
cell division cycle 2; G1 to S and G2 to M 
ESTs 
ESTs 

ESTs; Moderately similar to till ALU SUBFAMILY SB2 WARNING ENTRY HI! [Ksapiens] 
Homo sapiens agrin precursor mRNA; partial cds 
ESTs 
ESTs 
ESTs 

ESTs; Weakly similar to intrinsic factor-B12 receptor precursor [H. sapiens] 
ESTs 

protein regulator of cytokinesis 1 
ribosomal protein L23a 
ESTs 

EST cluster (not in UniGene) 
ESTs 
ESTs 

ESTs; Weakly similar to mitogen inducible gene mig-2 [H.sapierts] 
ESTs; Weakly similar to microtubule-actin crosslinking factor [M.musculus] 
ESTs 

EST cluster (not in UniGene) with exon hit 
ESTs 
EST 
ESTs 

ESTs; Weakly similar to similar to zinc finger 5 protein from Gallus gallus; U51 640 [Ksapiens] 
EST cluster (not in UniGene) with exon hit 
EST 

EST cluster (not in UniGene) 
EST cluster (not in UniGene) 
EST cluster (not in UniGene) with exon hit 
ESTs 
ESTs 
ESTs 

ESTs; Weakly similar to KIAA0969 protein [H.sapiensj 
ESTs 
ESTs 

EST cluster (not in UniGene) 

protein arginine N-methyltransferase 3(hnRNP methyltransferase S. cerevisiaeHike 3 
ESTs 
ESTs 

S-phase kinase-associated protein 2 (p45) 
ESTs 
ESTs 

ESTs; Moderately similar to Itll ALU SUBFAMILY SP WARNING ENTRY III! [H. sapiens] 
ESTs 
ESTs 
ESTs 

EST singleton (not in UniGene) with exon hit 
nuclear cap binding protein subunit 2; 20kD 
ESTs 
ESTs 

ESTs; Weakly similar to X-linked retinopathy protein (H. sapiens] 
ESTs 
ESTs 
ESTs 

ESTs; Moderately similar to III! ALU SUBFAMILY J WARNING ENTRY III! (Ksapiens] 
ESTs 
ESTs 

c12_hs gi|6682462|ref| gn 1 + 126724 126967 ex 7 7 CDSI 2.44 244 3099 

CH.12_hsgi|6682462 
AW263066 Hs. 11 81 12 ESTs 
A1440266 Hs.170673 ESTs 
CH22.522FG 126 1 UNK_EM:AC005500.GENSCAN.8-1 

CH22_FGENES.126_1 

lamlnin receptor 1 (67kD; ribosomal protein SA) 
ESTs 

ESTs; Moderately similar to RNA splicing-related protein [R.norvegicus] 
EST singleton (not in UniGene) with exon hit 
EST singleton (not in UniGene) with exon hit 
ESTs 

zinc finger protein 161 

ras- related C3 Botulinum toxin substrate 1 (rho family; small GTP binding protein Raci) 
ESTs; Weakly similar to transformation-related protein [Ksapiens] 
EST cluster (not in UniGene) with exon hit 
199078 199216 ex 4 4 CDSI 9.151391551 



AF155108 

N75081 

W07459 

AF131855 

AW291511 

AW168495 

T40769 

AW009951 

AA255977 

AW468402 

At 160527 

AI476797 

AW503733 

AW198103 

AI745325 

AA216698 

AI766732 

AW071648 

AI565071 

R20002 

AI990122 

AA417152 

AW083000 

AW1 36836 

AA524394 

AA764918 

AJ738488 

AW1 36973 

AW1 57646 

AI444628 

AJ227892 

AI025842 

D60374 

T47764 

AA745689 

AA581004 

H49388 

AI905228 

N79126 

AF182294 

AA205625 

AW449612 

AW297755 

AI986306 

N67879 

A1356352 

R33916 

A1383794 

AW138241 

AI801098 

R74219 

AW016607 

AI650363 

AI905527 

AW136551 

AI222165 

AI440137 

AI285499 

AW372884 

AL138357 

A1275055 

A1903735 

AW451142 

A1431345 

AI334367 

AA628539 

AA772279 

AI497778 



Hs.21648 

Hs.1 061 27 

Hs.253687 

Hs.8750 

Hs.8469 

Hs.206692 

Hs.250646 

Hs.254020 

Hs. 184572 

Hs.170315 

Hs.158154 

Hs.182286 

Hs.180780 

Hs.201194 

Hs.123199 

Hs.159983 

Hs.6823 

Hs.196988 

Hs.5101 

Hs. 184776 

Hs.144583 

Hs.256531 
Hs.11 5838 
Hs.1 44475 
Hs.153506 
Hs.256809 

Hs.1 52530 
Hs.258712 
Hs.132917 
Hs.166838 

Hs.102082 



Hs.208067 
Hs.1 52475 
Hs.148832 
Hs.233460 
Hs.157695 
HS.1 08932 

Hs.1 52337 

Hs.210846 

Hs.151500 

Hs.23348 

Hs.201582 

Hs.116462 

Hs.141901 

Hs.161245 

Hs.144923 

Hs.164989 

HS.240770 
HS.247514 
Hs.164989 
Hs.137527 
Hs.255628 
Hs.1 61 784 
HS.1 59337 
Hs.1 16252 
Hs.126914 
Hs.168053 



Hs.181357 
Hs.1 37635 
Hs.236511 



Hs.137516 
Hs.167558 
Hs.1 73737 
Hs.247277 



AW236171 
AA337621 
AI990739 
AI613519 
AI738593 
AA218940 
AI128606 
AA700017 
AA015730 
AW263124 

c_2_hsgi|6117815lref) gn6 



CH.02_hs gj|6117815 



11.5 

10.3 

9.6 

9.5 

8.9 

7.8 

7.3 

7.3 

6.7 

6.7 

6.1 

6.1 

5.7 

5.3 

5.1 

5.1 

4.6 

4.8 

4.7 

4.6 

4.6 

4.6 

4.5 

4.5 

4.4 

4.3 

4.3 

4.3 

4.3 

4.3 
4.1 
4.1 
4,1 
4.1 
4.0 
4.0 
3.9 
3.8 
3.8 
3.8 
3.8 
3.7 
3.7 
3.7 
3.7 
3.7 
3.6 
3.6 
3.6 
3.6 
3.5 
3.5 
3.4 
3.4 
3.4 
3.4 
3.4 
3.4 
3.3 
3.3 
3.3 
3.3 
3.2 
3.2 
3.2 
3.2 
3.2 
3.2 

3.2 
3.2 
3.1 

3.1 
3.1 
3.1 
3.1 
3.1 
3.0 
3.0 
3.0 
3.0 
3.0 
3.0 

2.9 
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333278 EOS33209 CH22_52lFG_125_2JJNK_EM:AC005500.GENSCAN.7-2 

CH22_FGENES.t25_2 2.9 

302088 EOS02019 U77629 Hs.135639 achaete-scute complex (Drosophila) homolog-like 2 2.9 

322718 EOS22649 AF1 50270 Hs.233322 ESTs; Weakly similar to cDNA EST EMBLT01 156 comes from this gene [C.elegans] 2.9 

5 329154 EOS29085 c x_hs gi|5668686|ref| gn 2 - 200851 201356 ex 1 3 COSI 30.28 506 1812 

CH.X.hs gi|5668686 2.9 

315978 EOS 15909 AA830893 Hs. 11 9769 ESTs 2.9 

302677 EOSQ2608 H63227 Hs. 132880 ESTs; Highly similar to ubiquitin-conjugating enzyme [M.musculus] 2.9 

315007 EOS14938 AI806583 Hs. 125291 ESTs 2.9 

10 303780 EOS03711 AI424014 Hs.243450 ESTs; Moderately similar to KIAA0456 protein [H .sapiens] 2.9 

331362 EOS31293 AA417956 Hs.40782 ESTs 2.9 

335815 EOS35746 CH22_3187FG_618_3_UNK_EM:AC005500.GENSCAN.510-3 

CH22_FGENES.618_3 2.8 

332070 EOS32001 AA598545 Hs.228138 EST 2.8 

15 315720 EOS15651 AW291875 Hs.163900 ESTs 2.8 

311913 EOS11844 AI358522 Hs.221417 ESTs 2.8 

331014 EOS30945 H98597 Hs.30340 ESTs 2.8 

322035 EOS21966 AL137517 EST cluster (not in UniGene) 2.8 

„ 338057 EOS37988 CH22_6558FG_UNK_EM:AC005500.GENSCAN. 160-1 

20 CH22_EM:AC005500.GENSCAN.160-1 2.8 

335829 EOS35760 CH22_3202FG_620_3_UNK_EM;AC005500.GENSCAN.51 2-3 

CH22_FGENES.620_3 2.8 

312136 EOS12067 AW451469 Hs.209990 ESTs 2.8 

303132 EOS03063 AI929819 Hs. 193330 ESTs 2.8 

25 317548 EOS17479 AI654187 Hs.195704 ESTs 2.8 

325585 EOS25516 c12_hs gi[6682462|ref| gn 1 + 73476 73574 ex 5 7 CDS! 8.52 99 309 

7 CH.12_hs pj|6682462 2.7 

334631 EOS34562 CH22_1939FG_416_7_UNK_EMAC005500.GENSCAN.277-7 

CH22_FGENES.416_7 2.7 

30 329156 EOS29087 C x_hs gi|5868686|ref| gn 2 - 20201 3 202341 ex 33 CDSf 10.23 329 1814 

Q CRX_hsgi|5868686 2.7 

. 318615 EOS18546 AJ133617 Hs.191088 ESTs 2.7 

300734 EOS00665 AW205197 Hs_>40951 ESTs 2.7 

i-D.- 324430 EOS24361 AA464018 EST duster (not in UniGene) 2.7 

h7I35 322296 EOS22227 W76326 Hs.251937 ESTs 2.7 

^ 303842 EOS03773 A1337304 Hs.126268 ESTs; Weakly similar to similar to PDZ domain [C.elegans] 2.7 

ifl 320909 EOS20840 D62269 EST cluster (not in UniGene) 2.7 

l'~ 325195 EOS25126 T20258 Hs.171443 ESTs; Weakly similar to actin binding protein MAYVEN [H. sapiens] 2.7 

U 324959 EOS24890 AW367745 Hs.143137 ESTs 2.7 

r;?40 309997 EOS09928 AI291621 Hs. 1451 99 ESTs 2.7 

1?' 329367 EOS29298 c_x hs gi|5868842|ref] gn 1 • 87201 87587 ex 1 4 COSI 8.13 387 3908 

O CH.X_hs gi|5868842 2.7 

316697 EOS16628 AW293174 Hs.252627 ESTs 2.7 

:s A _. 313600 EOS13531 AA429564 Hs. 185802 ESTs 2.7 

fl 45 301471 EOS01402 AA995014 Hs.129544 ESTs; Weakly similar to ORF YLL027W [S.cerevisiae] 2.6 

1Z 300810 EOS00741 A1076890 Hs. 186949 ESTs 2.6 

319976 EOS19907 N48809 Hs.250824 ESTs 2.6 

l& 313434 EOS13365 W92070 Hs.231902 ESTs 2.6 

333849 EOS33780 CH22_1118FG_290_8_UNK_EM:AC005500.GENSCAN. 146-7 

50 CH22_FGENES,290_8 2.6 

330744 EOS30675 AA406142 Hs. 12393 aTDP-D-glucose 4;6-dehydratase 2.6 

309398 EOS09329 AW081820 EST singleton (not in UniGene) with exon hit 2.6 

M 338727 EOS38658 CH22_7523FG_UNK_EM:AC005500.GENSCAN. 500-2 

CH22_EM:AC005500.GENSCAN.500-2 2.6 

5 5 324620 EOS24551 AA448021 EST cluster (not in UniGene) 2.6 

335755 EOS35686 CH22 3122FG 604 4 UNK EM:AC005500.GENSCAN.493-9 

CH22_FGENES.604_4 2.6 

315858 EOS15789 AA737345 EST cluster (not in UniGene) 2.6 

307288 EOS07219 A1205169 EST singleton (not in UniGene) with exon hit 2.5 

60 * 330542 EOS30473 U23942 Hs.226213 cytochrome P450; 51 (lanosterol 14-alpha-demethylase) 2.5 

335896 EOS35827 CH22.3273FG 635_4_UNK_EM:AC005500.GENSCAN.525-6 

CH22_FGENES,635_4 2.5 

316578 EOS16509 AA 77 562 3 Hs.211683 ESTs 2.5 

_ 329193 EOS29124 c x hs gi|5868716|refl gn 3 + 168095 168181 ex 9 9 CDSI -1.11 87 2064 

65 CH.X_hsgi|5868716 2.5 

315193 EOS15124 AI241331 Hs.1 31765 ESTs 2.5 

319478 EOS 19409 R06841 EST cluster (not in UniGene) 2.5 

334727 EOS34658 CH22_2038FG_424_1 _UNK_EM:AC005500.GENSCAN.285-3 

_ CH22_FGENES.424_1 2.5 

70 3281 13 EOS28044 c_6_hs gi|5868024|refl gn 2 - 80378 80491 ex 2 3 COSi 3.89 114 3247 

CH.06_hs gi|5868024 2.5 

315214 EOS15145 AJ 9 15927 Hs. 34771 ESTs 2.5 

324718 EOS24649 A1557019 Hs. 116467 ESTs 2.5 

313326 EOS13257 AI088120 Hs. 122329 ESTs 2.5 

75 319480 EOS19411 R06933 Hs. 184221 ESTs 2.5 

317902 EOS17833 AI828602 Hs.2 11265 ESTs 2.5 

32334 1 EOS23272 AL1 34875 Hs. 1 92386 ESTs 2.5 

336003 EOS35934 CH22 3385FG_664_4_UNK_DJ32M0.GENSCAN.5-4 

CH22_FGENES.664_4 2.5 

80 322992 EOS22923 AA142891 Hs. 1931 65 ESTs 2.5 

314911 EOS14842 AW292329 Hs. 163481 ESTs 2.5 

313603 EOS13534 AW468119 EST cluster (not in UniGene) 2.5 

306469 EOS06400 AA983792 EST singleton (not in UniGene) with exon hit 2.5 

324715 EOS24646 AI739168 EST cluster (not in UniGene) 2.5 

85 302455 EOS02386 AA356923 Hs.240770 nuclear cap binding protein subunit 2; 20kD 2.4 

321023 EOS20954 H25135 Hs.125608 ESTs 2.4 
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302099 
314092 
316587 
303702 
301822 
322694 
323333 
301954 
331363 
303811 
308243 
336021 



320807 
328903 



303597 
305898 
304439 
301604 
315071 
330565 
331589 
303216 
324988 
312996 
332314 
313325 
322991 
335496 

315135 
319488 
323571 
322826 
322221 
312242 
315238 
315168 
300504 
323243 
331628 
320746 
324598 
308667 
302944 
316291 
315296 
334150 

331380 
321795 
331493 
312890 
315583 
314306 
314138 
302656 
313564 
332792 

332020 
315143 
313385 
323835 
314014 
336016 

323218 
338059 

302613 
304852 
308457 
311736 
334183 

315021 
303013 
315006 



EOS02030 
EOS14Q23 
EOS18518 
EOS03633 
EOS01753 
EOS22625 
EOS23264 
EOS01885 
EOS31294 
EOS03742 
EOS08174 
EOS35952 



334789 EOS34720 



EOS20738 
EOS28834 



338759 EOS38690 



333769 EOS33700 



EOS03528 
EOS05829 
EOS04370 
EOS01535 
EOS15002 
EOS30496 
EOS31520 
EOS03147 
EOS24919 
EOS12927 
EOS32245 
EOS13256 
EOS22922 
EOS35427 

EOS15066 
EOS 194 19 
EOS23502 
EOS22757 
EOS22152 
EOS12173 
EOS15169 
EOS15099 
EOS00435 
EOS23174 
EOS31559 
EOS20677 
EOS24529 
EOS08598 
EOS02875 
EOS16222 
EOS15227 
EOS34081 

EOS31311 
EOS21726 
EOS31424 
EOS12821 
EOS15514 
EOS14237 
EOS14069 
EOS02587 
EOS13495 
EOS32723 

EOS31951 
EOS15074 
EOS13316 
EOS23766 
EOS13945 
EOS35947 

EOS23149 
EOS37990 

EOS02544 
EOS04783 
EOS08388 
EOS11667 
EOS34114 

EOS14952 
EOS02944 
EOS14937 



AL021397 Hs.137576 ribosomaJ protein L34 pseudogene 1 
AI984040 Hs.226946 ESTs 
AA779704 Hs. 168830 ESTs 

AW50O748 Hs.224961 ESTs; Weakly similar to 73 kDA subunit of cleavage and polyadenylation specificity factor [H.sapiens] 

X17033 Hs.1 142 integnn; alpha 2 (CD498; alpha 2 subunit of VLA-2 receptor) 

All 1 0872 EST cluster (not in UniGene) 

AA228883 EST cluster (not in UniGene) 

AJ 009936 Hs. 11 81 38 nuclear receptor subfamily 1 ; group I; member 2 

AA421 562 Hs.9101 1 anterior gradient 2 (Xenepus laevis) homoiog 

AW182340 Hs.246155 ESTs; Weakly similar to DNA TOPOISOMERASE I [H.sapiens] 

AI560037 EST singleton (not in UniGene) with exon hit 

CH22_3404FG_669_1 0_UNK_DJ321 1 0.GENSCAN.9-1 5 

CH22_FGENES.669_10 
CH22.21 01 FG_432_14_UNK_EM:AC005500.GENSCAN.293.1 7 

CH22_FGENES.432_14 
AA0861 10 Hs. 188536 Homo sapiens done 24838 mRNA sequence 
c_8 hs gi|5868514|ref] gn 1 + 23625 24468 ex 3 5 CDSi 91.18 844 219 

CH.08_hsgi|5868514 
CH22.7581 FG__UNK_EM:AC005500.GENSCAN.51 7-6 

CH22_EM:AC005500.GENSCAN.51 7-6 
CH22_1036FG_271_8_UNK_EM:AC005500.GENSCAN.127-8 

CH22J=GENES.271_8 

ESTs; Weakly similar to brain mitochondrial carrier protein- 1 [H.sapiens] 
keratin 8 

EST singleton (not in UniGene) with exon hit 
ESTs; Weakly similar to C17G10.1 [C.elegans] 
ESTs 

caudal type homeo box transcription factor 1 
ESTs 
ESTs 

EST cluster (not in UniGene) 
EST cluster (not in UniGene) 
ESTs 
ESTs 
ESTs 

CH22_2848FG_571_4_UNK_EM:ACO>5500.GENSCAN.460-25 
CH22 FGENES.571 4 
ESTs 

EST cluster (not in UniGene) 
c-CbMnteracting protein 
ESTs 

nucleosome assembly protein Mike 1 
ESTs 
ESTs 
ESTs 

ESTs; Weakly similar to Urn kinase [H.sapiens] 
EST cluster (not in UniGene) 
ESTs 

EST cluster (not in UniGene) 
ESTs 

EST singleton (not in UniGene) with exon hit 

ESTs; Weakly similar to cyclic nucleotide-gated channel beta subunit [R.norvegicus] 
ESTs 
ESTs 

CH22J429FG_339_1 JJNK.EM: AC005500.GENSCAN. 1 89- 1 

CH22_FGENES.339_1 
AA453266 Hs.246131 ESTs 
AI796896 Hs.222448 ESTs 
N34357 Hs.44571 ESTs 
AI813654 Hs. 127478 ESTs 
AW003622 Hs.1 26555 ESTs 
AI697901 Hs.1 92425 ESTs 
AA740616 EST cluster (not in UniGene) 

AW293005 Hs.220905 ESTs 
AA810141 Hs.1 921 82 ESTs 
CH22.8FG_3_2.UNK_C4Gl.GENSCAN.3-2 

CH22_FGENES.3_2 
Hs.105219 ESTs 
Hs.192734 
Hs.1 76711 



AI792141 

AA872838 

AA398882 

AA373124 

AA552690 

U51095 

N71027 

AA581439 

T06997 

AA249018 

T25862 

AI420611 

C18965 



AA627561 

AW250340 

AA984133 

AI807883 

AI890619 

AI380207 

AA593867 

AA622130 

AW204624 

W44372 

R80965 

AA128302 

AA502659 

AI758754 

AA340708 

AW375974 

AA876905 



Hs.143560 
Hs.242463 

Hs.1 05837 
HS.1 52423 
Hs.1 545 
Hs.41856 
Hs.1 52328 



Hs.101774 
Hs.1 27832 
Hs.1 59473 



Hs.192446 

Hs.1 53260 
Hs.1 56932 
Hs.1 79662 
Hs.1 25276 
Hs.170890 
Hs.1 52524 
Hs.192927 

Hs.204079 

Hs.1 63986 

Hs.256204 
Hs.156704 
Hs.1 25286 



AA488895 
AA878324 
A1032087 
AL042005 
AW291847 



ESTs 
ESTs 

EST cluster (not in UniGene) 
Hs.121715 ESTs; Weakly similar to HP protein [H.sapiens] 
CH22 3399FG 669 5 LINK DJ32M0.GENSCAN.9-10 

CH22_FGENES.669_5 
AF131846 Hs.13396 Homo sapiens done 25028 mRNA sequence 
CH22_656lFG_UNK_EM:AC005500.GENSCAN.160-4 

CH22_EM;AC005500.GENSCAN. 1 60-4 
AA371 059 Hs.25 1 636 ubtquitin specific protease 3 
AA568595 EST singleton (not in UniGene) with exon hit 

A1669859 EST singleton (not in UniGene) with exon hit 

AA765897 EST duster (not in UniGene) 

CH22_1464FG_350_13_UNK_EM:AC005500,GENSCAN^09-16 

CH22_FGENES.350_13 
AA533447 EST duster (not in UniGene) 

F07898 Hs.21 41 90 interleukin enhancer binding factor 1 
AI538613 Hs.1 35657 ESTs 
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™5f !2S 3746S CH22.5803FG.828 3 CH22 FGENESB28.1 

303276 EOS03207 AA43IS99 Hs 132799 ESft 

318617 EOS18S48 AW2472S? H.7«t! , . 22 

<- 330760 EOSsSS? S S ^^T" 6 2., 

5 319545 EOS19476 R83716 Hs.74355 §£ l\ 

312252 EOS12183 AI128388 Hs 143655 EStI 2 ' 1 

S IS! AW294Q20 ffiT— — *-Q«. 

io s is a? isas ^ we ^^''''«»^3UBP>« ( ,v J vv A „ NINGEMTnynil ii 

314778 EOS14709 £ ££ ffj* « **' "> » IP -*gml 2-1 

319233 EOS19164 R21054 Hsiillffi J 1 

15 335488 EOS35419 CH22_2840FG_570_20^LINK^EM^C005500.GENSCAN 460-1 5 ' 

334616 E0S34547 CH22.,923FG_4,,_, 5 _U^ 

on 306792 EOS06723 A1042426 c^-^^ 8 ?- 411 - 15 

20 301661 EOS01592 AI815558 ESTsmgleton (not in UniGene) with axon hit 2.1 

311332 EOS11263 AW292247 HS25S0S2 f|I, C,usler < no1 in UniQ e"e) with axon hit 2.1 

314785 EOS14716 AI538226 felXtM IItI V 

25 £ ES = S f 

323740 EOS23671 AA324643 Hs.246106 ESTs ? 
3360,9 EOS35950 C^^mTum^O^mC^ 



2.1 
2.1 
2.1 

2.1 



2.1 
2.1 



2.0 
2.0 



2.0 
2.0 



— — — — w. <uvnM. ; 

^ 314954 EOS148BS AA521381 Hs.187726 ™*- KENES ™-* 

JU 303037 EOS02968 AF1 18395 ^1 

n 302056 EOS01987 AJ457532 Hs 126082 ||I d ^ ln V raGen e) «"'»' exon hit 21 

J~ 315178 EOS1S109 AW362945 Hs.'l624S9 ||^ Moderalel > rs,mlarto «OSA26AS(M.n,uscul l iS] 2-1 

:| 5 S IS S^FGjSuS^ fl 

y s is sss. *«- |§3^^^^ 

M 311315 EOS11246 AW450536 Hs 209260 ||I. s,n 9 lelon C 01 ln ^ne) with exon hit P™uci iHsapans] 

£L 9M EOS119 ' 9 AW016096 Hs.138oT EOT? " 
l7*V 302638 EOS02569 AA463798 IteliSfioe ccr ... i0 

K a sss EL £S SS^-«---*p.-^ - 

ssz sss d? £5F ^^s^^^^^ 

a 5 sss- is ass w. -r d(D ^-^ s 

g s bsk i~ is 

^ Q 338454 EOS38385 CH^.7,28FG_UNk!Im:AC^500.GENSCAN.3 6 (M 

309700 EOS09631 AW241170 Hs 17Qfifii £^- EM * C0 °5500GENSCAN.36O4 

Q 330262 EOS30193 SPAJSSg* XSS^SS^T'^"' 

? 55 ™ M EOS24094 A" 46827 HS.1346S1 fs^*! 667 ' 884 

SSS IS S ft™ 11^*™"^^ % 

326757 EOS26688 c20_hs ^|6249610|ref| g„3 ^«W 7«7« 1 3CDS, 9.5267 14,6 20 

60 3 ™n I2X E953 %S 1? 2.0 

313635 EOS13S66 AAS07227 Hs.6390 isTs ?° 

310027 EOS099S8 AW449009 Hs.126647 ESTs 20 

336662 EOS36593 CH22 413SFG 4 , CH22 FGENM , 2 0 

65 334648 EOS34579 ^Z^G^Ts^^^ 

308676 EOS08607 AI761036 CH22 FGENES.417.15 

312047 EOS11978 %£2Ss HS14258 E fl, 8in(tel0n < not in «*"«•) with exon hi. 2.0 

324826 EOS24757 AA704806 Hs: 43842 ESTs 2 0 

If) EOS2282 ° AA081924 HW1MI7 ESTs ?° 

70 316345 EOS16276 AW139408 Hs.Ts2940 ESTs 20 

313922 EOS13853 A1702038 Hs. 00057 isTs ? ° 

319423 EOS19354 T83024 Slim Ills f n ° 

7 , s sss s? Hs - ,2977a grr*,-* 

75 334223 EOS34154 ^^360.4^ 

302980 EOS02911 W93435 25*r P ? E ? EM8 ^ 4 
g 0 326460 EOS26391 Cl9_hs gi|5867400|rel| gn 3- 142633 142935 ex 1 2CDS1 19.03 303 1731 1.9 



319962 EOS19893 H06350 Hs.135056 ^ 9 - hS "I 586740 " 

307064 EOS06995 AI149335 ra,J *"° "J. s . 1.9 

331608 EOS31S39 N89861 HS44162 |fl a "^ ' no y n Urt Gene) with exon hit 1.9 

g5 328,42 EOS28073 OJ-MMaS^,^^^^^ 

312527 EOS12458 AI695522 Hs.,9,27, ^- hS *l«68050 
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AA693880 

AJ915014 

AJ989963 

R41791 

AI523875 

AI761786 



Hs.164235 
Hs. 197505 
Hs.36566 

Hs.204674 



318581 EOS18512 AA769058 EST cluster (not in UniGene) 

319979 EOS19910 AB018281 Hs.107479 KIAA0738 gene product 
336107 EOS36038 CH22_3496FG_696_3_UNK_DA59H18.GENSCAN.4-3 

CH22_FGENES.696_3 

305232 EOS05163 AA670052 Hs.195188 gryceraldehyde-3-phosphate dehydrogenase 
31 5043 EOS14974 AA806538 Hs.1 30732 ESTs 

323377 EOS23308 AA1 33260 Hs.8454 protein kinase; cAMP-dependent; regulatory; type II- alpha 
338260 EOS38191 CH22_6863FG_UNK_EM:AC00550aGENSCAN.279-10 

CH22_EM:AC005500.GENSCAN.279-10 
334891 EOS34822 CH22_2208FG_452_5_UNK_EM:AC005500.GENSCAN.341-8 

CH22_FGENES.452_5 
EST cluster (not in UniGene) 

ESTs; Weakly similar to III! ALU SUBFAMILY J WARNING ENTRY !!!! [H.sapiens] 
ESTs 1 
UM domain kinase 1 
EST cluster (not in UniGene) 
ESTs 

CH22_6105FG__UNK_EM:AC000097.GENSCAN.109-2 

CH22.EM AC000097.GENSCAN. 1 09-2 
AA332145 EST cluster (not in UniGene) 

CH22.21 86FG_450_2_UNK_EM:AC005500.GENSCAN.339-2 
CH22_FGENES.450_2 

AA489847 Hs.1 12019 ESTs; Moderately similar to !!!! ALU SUBFAMILY J WARNING ENTRY !!!! [H sapiens] 
AA609161 Hs.1 12657 ESTs; Weakly similar to ORF YOR243c [S.cerevisiae] 
AI056776 Hs.133397 ESTs 

c_x_hs gi|6017060|ref| gn 1 + 343924 343997 ex 2 3 CDSi 8.53 74 1715 

CH.X_hs gij6017060 
N9861 9 Hs.62461 ARP2 (actin-related protein 2; yeast) homolog 
AA535580 Hs.1 6331 ESTs 
AW021917 Hs.181878 ESTs 
CH22_3430FG_679_7_UNK_DJ32H0.GENSCAN.18-8 

CH22_FGENES.679 7 
AW272262 Hs.250468 ESTs 
AW152449 Hs.226469 ESTs 
AW504689 EST cluster (not in UniGene) 

AA570698 Hs.193203 ESTs 
AA602917 Hs.156974 ESTs 

CH22_3274FG_635_5_UNK_EM_\C005500.GENSCAN.525-7 

CH22_FGENES.635_5 
AI806500 Hs.102652 ESTs; Weakly similar to KIAA0437 [H.sapiens] 
CH22_3048FG_596_2_UNK_EM:AC005500.GENSCAN.488-2 

CH22_FGENES.596_2 
AA278816 Hs.177204 ESTs 

AA079476 Hs.1 09857 ESTs; Highly similar to CGI-89 protein [Rsapiens] 
CH22.3791 FG_821_7_UNK_BA232E1 7.GENSCAN.4-1 9 

CH22_FGENES.821_7 
AA81 3590 Hs.1 1 9500 karyopherin alpha 4 (importin alpha 3) 
AW269082 Hs.1 751 62 ESTs 

c_2_hs gj|5867783|ref| gn 3 + 104472 104673 ex 1 4 CDSf 14.33 202 1308 

CH.02_hs gj|5867783 
T84852 Hs.98370 cytochrome P540 family member predicted from ESTs 
CH22_2830FG_569_1_UNK EM:AC005500.GENSCAN.456-1 

CH22_FGENES.569_1 
R61398 Hs.4197 ESTs 

CH22_3051FG_596_5_UNK_EM:AC005500.GENSCAN.488-5 

CH22_FGENES.596_5 
AI459633 EST singleton (not in UniGene) with exon hit 

CH22.1 800FG_397_1 6_UNK_EM:AC005500.GENSCAN.260-1 8 
CH22_FGENES.397_16 
338250 EOS38181 CH22_6848FG__UNK_EM:AC005500.GENSCAN.269- 

2 CH22_EM:AC005500.GENSCAN.269-2 
AI220276 Hs.235228 EST 

CH22_2367FG_480_1_UNK_EM:AC005500.GENSCAN.374-1 
CH22.FGENES.480 1 
ESTs 

ESTs; Weakly similar to HI! ALU SUBFAMILY J WARNING ENTRY Nil [H.sapiens] 
ESTs; Weakly similar to alternatively spliced product using exon 13A [H.sapiens] 
ESTs 
ESTs 

ESTs; Moderately similar to putative phosphoinositide 5-phosphatase type II [M.musculus] 
pallid (mouse) homolog; pallidin 
CH22_2119FG_435_7_UNK_EM:AC005500.GENSCAN.296^ 
CH22.FGENES.435 7 
ESTs 
ESTs 
ESTs 
ESTs 

c_7_hs gi]5868425]ref| gn 2 - 209192 209321 ex 2 3 CDSi 10.41 130 1407 
CH.07_hs gj[5868425 

328857 EOS28788 c_7_hs gi|6381927|ref] gn 3 - 80557 81051 ex 1 1 COSo 41.51 495 6090 

CH.07_hs gj|6381927 
313781 EOS13712 AA078836 EST cluster (not In UniGene) 

336953 EOS36884 CH22.4746FG 361 22 CH22.FGENES.361-22 
300233 EOS00164 AI380777 Hs.1 89402 ESTs 

326862 EOS26793 c20_hs gi| 6552465] ref| gn 2+ 107702 107782 ex 12 13 CDSi 3.6281 2149 

CH.20_hs gi]6552465 



316055 
312414 
300225 
332607 
312405 
313605 
337755 

323216 
334872 

332034 
332103 
318196 
329141 

321539 
313881 
314046 
336045 

324799 
312656 
324662 
323930 
314465 
335697 

321746 
335687 

330731 
315542 
336379 

305691 
310639 
327481 

301910 
335478 

331135 
335690 

308047 
334500 



EOS15986 
EOS 12345 
EOS00156 
EOS32538 
EOS12336 
EOS 13536 
EOS37686 

EOS23147 
EOS34803 

EOS31965 
EOS32034 
EOS18127 
EOS29072 

EOS21470 
EOS13812 
EOS13977 
EOS35976 

EOS24730 
EOS12587 
EOS24593 
EOS23861 
EOS14396 
EOS35828 

EOS21677 
EOS35618 

EOS30662 
EOS15473 
EOS36310 

EOS05622 
EOS10570 
EOS27412 

EOS01841 
EOS35409 

EOS31066 
EOS35621 

EOS07978 
EOS34431 



320618 
335044 

313789 
311911 
320180 
311036 
323903 
318676 
303007 
334806 

311767 
331750 
314872 
314071 
328450 



EOS20549 
EOS34975 

EOS13720 
EOS 11 842 
EOS20111 
EOS10967 
EOS23834 
EOS18607 
EOS02936 
EOS34737 

EOS11698 
EOS31681 
EOS14803 
EOS14002 
EOS28381 



AJ 167810 

AI087123 

AA846203 

AI539227 

AA773580 

T57448 

AA478876 



AI076686 
AA284372 
AI144254 
AA1 92455 



Hs.2 17743 

Hs.1 14434 

Hs.193974 

Hs.214039 

Hs.193598 

Hs.15467 

Hs.7037 



Hs.190066 
Hs.1 11471 
HS239726 
Hs.1 88690 



73 



10 



15 



20 



25 



30 



«35 

ii * *' 



us 



5i=5t 



55 



60 



65 



70 



75 



80 



85 



312364 
321541 
307432 
320921 
333110 

324914 
312681 
335697 

308462 
312138 
309116 
320730 
300844 
337570 

332756 
332161 
300942 
300680 
328783 

307542 
331975 
321532 
318721 
302124 
323541 
331057 
316860 
330601 
307334 
323195 
303856 
321553 
332705 
333139 



EOS12295 
EOS21472 
EOS07363 
EOS20852 
EOS33041 

EOS24845 
EOS12612 
EOS35628 

EOS08393 
EOS12069 
EOS09047 
EOS20661 
EOS00775 
EOS37501 

EOS32687 
EOS32092 
EOS00873 
EOS00611 
EOS28714 

EOS07473 
EOS31906 
EOS21463 
EOS 18652 
EOS02055 
EOS23472 
EOS30988 
EOS16791 
EOS30532 
EOS07265 
EOS23126 
EOS03787 
EOS21484 
EOS32636 
EOS33070 



R40111 
AI220292 
A1244259 
R94038 



Hs.187618 
Hs.254467 
Hs.181165 
Hs.1 99538 



ESTs 
ESTs 



338997 EOS38928 



301509 
314522 
303072 
305271 
335287 

321286 
318740 
323465 
300611 
306235 
336721 
311291 
310247 
316564 
328170 

300909 
330869 
311048 
333764 



EOS01440 
EOS14453 
EOS03003 
EOS0S202 
EOS35218 

EOS21217 
EOS18671 
EOS23396 
EOS00542 
EOS06166 
EOS36652 
EOS 11 222 
EOS10178 
EOS16495 
EOS28101 

EOS00840 
EOS30800 
EOS10979 
EOS33695 



338862 EOS38793 



331467 
327742 

320955 
323589 
319951 
333763 

331046 
320001 
316869 
310774 
319379 
321549 
300823 
324228 
313902 
308928 
333770 



EOS31398 
EOS27673 

EOS20886 
EOS23520 
EOS19882 
EOS33694 

EOS30977 
EOS19932 
EOS16800 
EOS10705 
EOS19310 
EOS21480 
EOS00754 
EOS24159 
EOS13833 
EOS08859 
EOS33701 



eukaryotic translation elongation factor 1 alpha 1 
inhibjn;betaC 

CH22_338FG_79_1 6 U NK_EM:AC000097.GENSCAN.59-1 5 

CH22_FGENES.79_16 
AA847510 Hs,161292 ESTs 

AI028149 Hs. 1931 24 pyruvate dehydrogenase kinase; isoenzyme 3 
CH22_3058FG_596_12_UNK_EM:AC005500.GENSCAN.488-13 



AI025435 
AI732331 
AF 157833 
AA679895 



Hs. 11 7532 
Hs.187750 



AI380940 

NM.002543 

AA287406 

N75450 

AA932299 

CH22_4244FG_83_17_ 

AA762601 Hs.122684 

AI224982 Hs.211454 

AI743571 Hs.168799 



316934 EOS16665 



CH22_368FG_83_16_UNK_EMAC000097.GENSCAN.67-19 

CH22_FGENES.83_16 
CH22 7881FG_UNK DA59H18.GENSCAN.8-22 

CH22_DA59H18.GENSCAN.8-22 
ESTs 

ESTs; Moderately similar to III! ALU CLASS C WARNING ENTRY !!!! [H. sapiens] 
EST cluster (not in UniGene) with exon hit 
EST singleton (not in UniGene) with exon hit 
CH22_2629FG_526_11_UNK_EM:AC005500.GENSCAN.420-4 
CH22_FGENES.526_11 
EST cluster (not in UniGene) 
EST cluster (not in UniGene) 
EST cluster (not in UniGene) 
EST cluster (not in UniGene) with exon hit 
EST singleton (not in UniGene) with exon hit 
CH22.FGENES.83-17 
ESTs 
ESTs 

ESTs; Weakly similar to HII ALU SUBFAMILY J WARNING ENTRY III! [H. sapiens] 
c_6_hs gi|5868071|refj gn 1 + 93170 93295 ex 9 9 CDS1 13.31 126 3591 
CH.06_hs aj|5868071 

AW295479 Hs.1 54903 ESTs; Weakly similar to Abl substrate ena [D.melanogaster] 
AA1 15197 Hs.183702 ESTs 
AA506952 Hs.2 10508 ESTs 

CH22 J 031 FG_271_3_UNK_EM:AC005500.GENSCAN. 1 27-3 

CH22 FGENES.271 3 
CH22 7715FG_UNK_OJ32M0.GENSCAN.1-6 

CH22.DJ321 1 0.GENSC AN. 1 -6 
N22206 Hs.43112 ESTs 

c_5_hs gi|5867944jref| gn 3 - 143307 143512 ex 1 3 CDS1 1 1.07 206 172 
CH.05_hsgi|5867944 

AL049415 Hs.204290 Homo sapiens mRNA; cDNA DKFZp586N21 19 (from clone DKFZp586N21 19) 
AW390054 HS. 1 92843 ESTs 
AA307665 Hs.14559 ESTs 

CH22_1030FG_271_2_UNK_EM:AC005500.GENSCAN.127-2 
CH22.FGENES.271 2 
ESTs 

EST cluster (not in UniGene) 
ESTs 
ESTs 
ESTs 
ESTs 

ESTs; Weakly similar to putative zinc finger protein NY-REN-34 antigen [H.sapiens] 
ESTs 
ESTs 

EST singleton (not in UniGene) with exon hit 
UNK_EM:AC005500.GENSCAN.127-10 
CH22_FGENES.272_1 
AI571647 Hs.146170 ESTs 



Hs.191358 



N66563 
AA873350 
AI954880 
AW134483 
T91443 
AA470984 
AI863068 
AI798146 
AI308165 
A1863908 

CH22 1037FG_272_1 



Hs.134604 
Hs.164371 
Hs. 193963 
Hs.161947 
Hs.222665 
Hs.207780 
Hs.156242 



1.8 
1.8 
1.8 
1.8 

1.8 

1.8 
1.8 







CH22_FGENES.596_12 


1 n 

i.o 


AI671311 




EST singleton (not in UniGene) with exon hit 


1 R 


T89405 


Hs.218851 


ESTs; Weakly similar to HII ALU SUBFAMILY J WARNING ENTRY HI! [H.saptens] 


1 R 

l.O 


AI927149 


Hs.29797 


ribosomal protein L10 


1.8 


AA534539 


Hs.1 51072 


ESTs 


1.8 


AL042759 


Hs.191762 


ESTs 


1.8 


CH22_5856FG_UNK_C65E1 .GENSCAN.4-2 








CH22_C65E1.GENSCAN.4-2 


1.8 


D63479 


Hs.1 15907 


diacylglyceroi kinase; delta (130kD) 


1.8 


AA621523 


Hs.165464 


ESTs 


1.8 


AW275006 


HS.1 95969 


ESTs 


1.8 


AW468066 


Hs.257712 


ESTs; Weakly similar to KIAA0986 protein [H.sapiens] 


1.8 


c_7_hs gi|5868309|ref| gn 5 - 73658 73822 ex 2 5 CDSi 0.78 165 5371 








CH.07_hs gi|5868309 


1.8 


A1280859 




EST singleton (not in UniGene) with exon hit 


1.8 


AA464972 


Hs.99624 


ESTs 


1.8 


T77886 


Hs.83428 


nuclear factor of kappa light polypeptide gene enhancer in B-ceJls 1 (pi 05) 


1.8 


Z28504 




EST cluster (not in UniGene) 


1.8 


AB023967 


Hs.145078 


regulator of differentiation (in S. pombe) 1 


1.8 


Al 1851 16 


Hs.1 0461 3 


ESTs; Weakly similar to Similar to S.cerevisiae hypothetical protein L31 1 1 [H .sapiens] 


1.8 


N71399 


Hs.28143 


ESTs 


1.8 


AW139099 


Hs.127489 


ESTs 


1.8 


U90916 


Hs.82845 


Human clone 23815 mRNA sequence 


1.8 


AI214811 


Hs.220615 


ESTs; Weakly similar to TFII-I protein [H. sapiens] 


1.8 


AI064982 


Hs.1 17950 


multifunctional polypeptide similar to SAJCAR synthetase and AIR carboxylase 


1.8 


AA968589 


Hs.944 


glucose phosphate isomerase 


1.8 


H92449 


Hs.1 16406 


ESTs 


1.8 


T59161 


Hs.76293 


thymosin; beta 10 


1.8 



1.8 

1.8 
1.8 
1.8 
1.8 
1.8 

1.8 
1.8 
1.8 
1.8 
1.8 
1.8 
1.8 
1.8 
1.8 
1.8 

1.8 
1.8 
1.8 
1.8 

1.8 

1.8 
1.8 

1.8 
1.8 
1.8 
1.8 

1.7 
1.7 
1.7 
1.7 
1.7 
1.7 
1.7 
1.7 
1.7 
1.7 
1.7 

1.7 
1.7 



74 



313219 
317360 
303530 
334739 



EOS13150 
EOS17291 
EOS03461 
EOS34670 



N74924 
Al 125252 
AI274851 



Hs.182099 
Hs.126419 
Hs.258744 



ESTs 
ESTs 
ESTs 



71 
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337670 EOS37601 



312079 
320211 
316218 
335682 

330696 
314449 
311972 
307691 
338249 



EOS12010 
EOS20142 
EOS16149 
EOS35613 

EOS30627 
EOS14380 
EOS11903 
EOS07622 
EOS38180 



326399 EOS26330 



313290 
301615 
307034 
313577 
324703 
321317 
312278 
333358 

322735 
326752 

314733 
312902 
322653 
336015 

324500 
310900 
337908 



EOS13221 
EOS01546 
EOS06965 
EOS13508 
EOS24634 
EOS21248 
EOS12209 
EOS33289 

EOS22666 
EOS26683 

EOS14664 
EOS12833 
EOS22584 
EOS35946 

EOS24431 
EOS10831 
EOS37839 



CH22_2051FG_424 14.UNK EM:AC005500.GENSCAN.285-16 

CH22.FGENES.424J4 
CH22_5996FG_UNK_EM:AC000097.GENSCAN.57-2 

CH22 EM:AC000097.GENSCAN.57-2 
T79745 Hs.189717 ESTs 
AL0394O2 Hs. 125783 DEME-6 protein 
AW207642 Hs.174021 ESTs 

CH22 3043FG 595 2 UNK_EM:AC005500.GENSCAN.487-1 1 
CH22_FGENES.595_2 
ESTs 
ESTs 
ESTs 

prothymosin; alpha (gene sequence 28) 
CH22_6847FG_UNK_EM:AC005500.GENSCAN,269-1 

CH22_EM^C005500.GENSCAN.269-1 
c19_hs gi|5867353|ref] gn 1 + 6385 6536 ex 6 6 COS1 10.69 152 684 
CH.19_hs gi|5867353 
ESTs 

EST cluster (not in UniGene) with exon hit 
EST singleton (not in UniGene) with exon hit 
ESTs 

Homo sapiens mRNA for cytochrome b5; partial cds 
ESTs; Weakly similar to KIAA0938 protein [Ksapiens] 
ESTs 

CH22 604FGJ41_9_UNK_EM:AC005500.GENSCAN.21-9 

CH22_FGENES.141_9 
AA086123 EST cluster (not in UniGene) 

c20_hs gi|5867615|ref| gn 1 - 1214 1562 ex 2 2 CDS! 33.07 349 1366 

CH.20_hsgi|5867615 
AW452355 Hs.256037 ESTs 
AW292797 Hs.130316 ESTs 
A1828854 Hs.171891 ESTs 
CH22_3398FG_669_4_UNK_DJ32M 0.GENSCAN.9-9 

CH22_FGENES.669_4 
AW269819 Hs.169905 ESTs 

AI922728 Hs. 165803 ESTs; Weakly simitar to till ALU SUBFAMILY SB WARNING ENTRY I! 
CH22 6323FG_UNK EM:AC005500.GENSCAN.57-1 

CH22 EM:AC005500.GENSCAN.57-1 



AA022632 
AL042667 
N51511 
AI318285 



AJ753247 

W39477 

A1142526 

AA565051 

AB0092B2 

AI937060 

AW205234 



Ks.15825 
Ks.225539 
Ks.188449 
Hs.182371 



Hs.206454 



Hs. 155029 
Hs.31086 
Hs.202040 
Hs.201587 



I [H.sapiens] 



1.7 
1.7 
1.7 

1.7 

1.7 
1.7 
1.7 • 
1.7 

1,7 
1.7 
1.7 
1.7 
1.7 

1.7 

1.7 
1.7 
1.7 
1.7 
1.7 
1.7 
1.7 
1.7 

1.7 
1.7 

1.7 
17 
1.7 
1.7 

1.7 
1.7 
1.7 

1.7 



304084 


EOS04015 


T91986 




EST singleton (not in UniGene) with exon hit 


1.7 


332539 


EOS32470 


AA412528 


Hs.20183 


ESTs; Weakly similar to cDNA EST EMBLT01421 comes from this gene [C.elegans] 


1.7 


314332 


EOSU263 


AL037551 


Hs.95612 


ESTs 


1.7 


321412 


EOS21343 


AW366305 




EST cluster (not in UniGene) 


1.7 


312187 


EOS12118 


AA700439 


Hs. 188490 


ESTs 


1.7 


314147 


EOS14078 


AJ 656135 


Hs.129805 


ESTs 


1.7 


303131 


EOS03062 


AW081061 


Hs.103180 


actin-like 6 


1.7 


331341 


EOS31272 


AA303125 


Hs. 11 9009 


ESTs; Weakly similar to Hit ALU SUBFAMILY SB2 WARNING ENTRY !!!! [H.sapiens] 


1.7 


313615 


EOS13546 


AW295194 


Hs.25264 


DKFZP434N126 protein 


1.7 


329598 


EOS29529 


c10_p2 gi[3962482lgb|A gn 4 + 39924 40220 ex 2 3 CDSi 8 .71 297 420 












CH.10_p2gi|3962482 


1.7 


303579 


EOS03510 


AA381124 


Hs.193353 


ESTs; Weakly similar to Mil ALU SUBFAMILY J WARNING ENTRY llll [H.sapiens] 


1.7 


331692 


EOS31623 


W93592 


Hs.47343 


ESTs 


1.7 


323977 


EOS23908 


AW328177 


Hs.234713 


ESTs 


1.7 


332930 


EOS32861 


CH22.1 51 FG_38_4_UNK_C20H 1 2.GENSCAN.29-4 












CH22_FGENES.38_4 


1.7 


326596 


EOS26527 


c19_hs gi|61 38928 1 re f| gn 4 + 133386 133563 ex 7 9 CDSi -1.32 178 3520 












CH.19_hsgj|6138928 


1.7 


314946 


EOS14877 


AI097229 


Hs.217484 


ESTs; Weakly similar to llll ALU SUBFAMILY J WARNING ENTRY till [H.sapiens] 


1.7 


315357 


EOS15288 


AA608684 


H3.121705 


ESTs; Moderately similar to lit! ALU CLASS C WARNING ENTRY III! [H.sapiens] 


1.7 


324728 


EOS24659 


AA303024 




EST duster (not in UniGene) 


1.7 


317501 


EOS17432 


AA931245 


Hs.137097 


ESTs 


1.7 


332219 


EOS32150 


N22508 


Hs.139315 


ESTs 


1.7 


335369 


EOS35300 


CH22_2718FG_543.7.UNK_EM:AC005500.GENSCAN.432-9 












CH22_FGENES.543_7 


1.7 


322417 


EOS22348 


W36286 


Hs.171873 


ESTs; Weakly similar to PUTATIVE STEROID DEHYDROGENASE KIK-I (M.musculus] 


1.7 


316100 


EOS16031 


AW203986 


Hs.213003 


ESTs 


1.7 


314866 


EOS14797 


AW305124 


Hs.191682 


ESTs 


1.7 


300328 


EOS00259 


AW0 15860 


Hs.224623 


ESTs 


1.7 


315676 


EOS15607 


AW002565 


Hs.136590 


ESTs 


1.7 


314183 


EOS14114 


AA748600 




EST cluster (not in UniGene) 


1.7 


321354 


EOS21285 


AA078493 




EST cluster (not in UniGene) 


1.7. 


311904 


EOS11835 


T86907 


Hs.1 19371 


ESTs 


1.7 


322890 


EOS22821 


AA082030 




EST duster (not in UniGene) 


1.7 


302759 


EOS02690 


AI885815 


Hs.184727 


ESTs 


1.7 


324600 


EOS24531 


AA503297 


Hs.117108 


ESTs 


1.7 


314973 


EOS14904 


AW273128 


Ks.254669 


EST 


1.7 


324432 


EOS24363 


AA464510 




EST duster (not in UniGene) 


1.7 


331520 


EOS31451 


N49068 


Hs.93966 


ESTs 


1.7 


308380 


EOS08311 


A1623988 




EST singleton (not in UniGene) with exon hit 


1.7 


331010 


EOS30941 


H95039 


Hs.32168 


KIAA0442 protein 


1.7 


325363 


EOS25294 


c12_hs gj|5fi 


(66920|refl gn 7 + 700446 700516 ex 6 8 CDSi -6.58 71 113 












CH.12_hs gj|5866920 


1.7 


310470 


EOS10401 


A1281848 


Hs.165547 


ESTs 


1.7 


330711 


EOS30642 


AA1 64687 


Hs. 177576 


mannosyl (alpha- 1;3-)-grycoproiein beta- 1;4-N-acetylglucosaminyttransf erase; isoenzyme A 


1.7 



75 



332074 EOS32005 

309732 EOS09653 

306337 EOS06266 

335189 EOS35120 



AA599012 
AW262211 
AA954221 



ESTs 



10 
15 
20 
25 

^5 

if =4 

eg 

ls50 



55 



60 



65 



70 



75 



80 



85 



316253 
332906 

310002 
332258 
336182 



308010 
304521 
318719 
321920 
315019 
320793 
305371 
305054 
314643 
308186 
319371 
331700 
316955 
314961 
336676 
322801 
303363 
328105 



EOS16184 
EOS32839 

EOS09933 
EOS32189 
EOS36113 



328987 EOS26918 



324481 
331406 
332280 
332173 
335739 

332104 
315033 
334740 



EOS24412 
EOS31337 
EOS32211 
EOS32104 
EOS35670 

EOS32035 
EOS14964 
EOS34671 



334783 EOS34714 



EOS07941 
EOS04452 
EOS18650 
EOS21851 
EOS14950 
EOS20724 
EOS05302 
EOS04985 
EOS14574 
EOS08117 
EOS19302 
EOS31631 
EOS16886 
EOS14892 
EOS36607 
EOS22732 
EOS03294 
EOS28036 



325481 EOS25412 



315361 
324902 
336018 

308747 
328251 

303153 
327809 

314107 
300304 
313009 
331074 

335773 

334991 

322959 
323731 
331073 
313573 
316949 
328084 

331526 
317987 
325594 

310846 
309268 
304518 
331065 
306501 
323289 
334630 



EOS15292 
EOS24833 
EOS35949 

EOS08678 
EOS28182 

EOS03084 
EOS27740 

EOS14038 
EOS00235 
EOS12940 
EOS31005 

EOS35704 

EOS34922 

EOS22690 
EOS23662 
EOS31004 
EOS13504 
EOS16880 
EOS28015 

EOS31457 
EOS17918 
EOS25525 

EOS10779 
EOS09199 
EOS04449 
EOS30996 
EOS06432 
EOS23220 
EOS34561 



AI916284 
AA610064 
R38100 
F09281 



HS.199671 
Hs.23440 
Hs.106294 
Hs.90424 



A1439190 

AA464716 

Z25900 

N63915 

AA532807 

AL049980 

AA714180 

AA634127 

A1587502 

AI537940 

R00321 

Z40011 

AW203959 

AW008061 



Hs.181165 

Hs.18724 

Hs.1 05822 
Hs.184216 

HS.182426 
Hs.192088 

Hs.1 74928 
HS.160582 
Hs.149532 
Hs.231994 



302025 EOS01956 



Hs.22826 

Hs.5662 guanine nucleotide bending protein (G protein); beta polypeptide 2-fike 1 
Hs.73742 ribosomal protein; large; P0 
CH22_2525FG_507 4 UNK EM:AC005500.GENSCAN.400-4 

CH22_FGENES.507_4 
AI919537 Hs.1 18056 ESTs 
CH22 129FG_36_12_UNK_C20H12.GENSCAN.28-9 

CH22_FGENES.36_12 
AW39096 Hs.25832 ESTs 

N68670 Hs. 1 03808 ESTs; Weakly similar to RanBPM [H. sapiens] 
CH22_3576FG_715_2_UNK_OA59H18.GENSCAN.19-3 

CH22_FGENES.715_2 
c_9_hs gi|5868535|ref| gn 1 - 25705 25764 ex 3 1 0 COSi 9.90 60 438 
CH.09_hs gi[5868535 
ESTs 

KJAA1 105 protein 
ESTs 
ESTs 

CH22_3102FG_601_10_UNK_EM:AC005500.GENSCAN.491-10 

CH22_FGENES.601_10 
AA609177 Hs.109363 ESTs 
AJ493046 Hs.1 461 33 ESTs 

CH22_2052FG_424 J 5_UNK_EM;AC005500.GENSCAN. 285-1 7 

CH22_FGENES.424_15 
CH22_2095FG_432_8_UNK_EM:AC005500.GENSCAN.293-1 1 

CH22_FGENES.432_8 

eukaryotic translation elongation factor 1 alpha 1 
EST singleton (not in UniGene) with exon hit 

Homo sapiens mRNA; cDNA DKFZp564F093 (from clone DKFZp564F093) 
EST cluster (not in UniGene) 
ESTs 

DKF2P564C152 protein 
EST singleton (not in UniGene) with exon hit 
ribosomal protein S2 
ESTs 

EST singleton (not in UniGene) with exon hit 
ESTs 
ESTs 
ESTs 
ESTs 

CH22.FGENES.43-4 
ESTs 

ESTs; Weakly similar to DIA-156 protein [Rsapiens] 
c_6_hs gi|5868020|ref| gn 1 1 - 301705 301784 ex 4 7 CDSi 5.30 80 3147 

CH.06_hsgi|5868020 
c12_hs gi|5866957|refl gn 3 + 47590 47672 ex 4 7 CDSi 2.69 83 1 895 

CH.12_hsgii5866957 
AJ335229 Hs.1 22031 ESTs 
D31323 Hs.211188 ESTs 
CH22.3401 FG 669_7_UNK_0J32I1 0.GENSCAN.9-12 

CH22_FGENES.669_7 
Al 804 500 Hs.181165 eukaryotic translation elongation factor 1 alpha 1 
c_6.hs gi|6381691|ref| gn 4 + 124444 124557 ex 2 3 CDSi 0.40 1 14 4554 

CH.06_hsgi|6381891 
U09759 Hs.8325 mitogen-activated protein kinase 9 
C_5.hs gi|5867968|ref| gn 3 + 54610 54761 ex 4 4 CDSI 0.78 152 993 

CH.05_hs gi|5867968 
AA806113 Hs.189025 ESTs 
AI637934 Hs.224978 ESTs 
W52010 Hs.191379 ESTs 

R08440 yf 19f9.s1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE: 127337 3' similar to 

contains Alu repetitive element;, mRNA sequence 
CH22 3142FG 607 9 UNK EM:AC005500.GENSCAN.496-4 

CH22_FGENES.607_9 
CH22 2312FG.469 11 UNK_EM:AC005500.GENSCAN.365-11 
CH22_FGENES.469_11 
EST cluster (not in UniGene) 
EST cluster (not in UniGene) 

ESTs; Weakly similar to till ALU SUBFAMILY J WARNING ENTRY HI! [H.sapiens] 
ESTs 
ESTs 

c_6.hs gi|6469819|refl gn 3 - 155366 155459 ex 1 4 CDSI 1 .23 94 2982 

CH.06.hs gt|6469819 
N49967 Hs.46624 ESTs 
AW138174 HS.130651 ESTs 

c13_hs gj)5866992|ref| gn 4 - 470474 470566 ex 2 3 CDSi 8.09 93 68 
CH.13_hsgi|5866992 
ESTs 

ferritin; heavy polypeptide 1 
EST singleton (not in UniGene) with exon hit 
Homo sapiens clone 25085 mRNA sequence 
EST singleton (not in UniGene) with exon hit 
ESTs 

CH22J 938FG_416_6_UNK_EM:AC005500.GENSCAN.277-6 

CH22_FGENES.416_6 
A1091466 Hs.127241 DKFZP564F052 protein 



CH22_4154FG_43_4_ 
AI831910 Hs.1 63734 
AI964095 Hs.226801 



AI267606 

AA323414 

R07998 

AI076259 

AA856749 



AI459554 
AI985821 
AA461438 
N90584 
AA987294 
•AL134235 



Hs.18628 
Hs.190337 
Hs.124620 



Hs.1 61286 
Hs.62954 



HS.9167 



Hs.222442 



1.7 
1.6 
1.6 

1.6 
1.6 

1.6 
1.6 
1.6 

1.6 

1.6 
1.6 
1.6 
1.6 
1.6 

1.6 
1.6 
1.6 

1.6 

1.6 
1.6 
1.6 
1.6 
1.6 
16 
1.6 
1.6 
1.6 
1.6 
1.6 
1.6 
1.6 
1.6 
1.6 
1.6 
1.6 
1.6 

1.6 

1.6 
1.6 
1.6 

1.6 
1.6 

1.6 
1.6 

1.6 
1.6 
1.6 
1.6 

1.6 

1.6 

1.6 
1.6 
1.6 
1.6 
1.6 
1.6 

1.6 
1.6 
1.6 

1.6 
1.6 
1.6 
1.6 
1.6 
1.6 
1.6 

1.6 

1.6 
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3289S8 EOS28929 c_9_hs gi| 5868538 1 ref| gn 1 + 40996 41 104 ex 1 3 CDSf 1 1.00 109 480 

CH.09_hs gi|5868538 1.6 

313197 EOS13128 AI738851 Hs.222487 ESTs 1.6 

338763 EOS38694 CH22_7585FG__UNK EM:AC005500.GENSCAN.517-16 

5 CH22_EM:AC005500.GENSCAN.517-16 1.6 

332247 EOS32178 N58172 Hs.109370 ESTs 1.6 

316724 EOS16655 AA810788 Hs.123337 ESTs 1.6 

303306 EOS03237 AA215297 EST cluster (not in UniGene) with exon hit 1.6 

306336 EOS06267 AA954198 EST singleton (not in UniGene) with exon hit 16 

10 308256 EOS08187 AI565498 EST singleton (not in UniGene) with exon hit 1.6 

307056 EOS06987 Al 148675 EST singleton (not in UniGene) with exon hit 16 

321370 EOS21301 AJ227900 EST cluster (not in UniGene) 16 

336262 EOS36193 CH22_3661FG_754_9_UNK_DA59H18.GENSCAN.57-11 

. _ CH22 FGENES.754 9 1 6 

15 335497 EOS35428 CH22 2849FG.571 5_UNK_EM:AC0055O0.GENSCAN.460-26 

CH22_FGENES.571_5 1.6 

309562 EOS09513 AW1 69657 EST singleton (not in UniGene) with exon hit 16 

329563 EOS29494 c10_p2 gj|3962490|gb|A gn 1 - 410 635 ex 2 2 CDSf 13.80 226 267 

CR10_p2gi|3962490 1.6 

20 332504 EOS32435 AA053917 Hs.15106 chromosome 14 open reading frame 1 1.6 

308090 EOS08021 A1474601 Hs.2186 eukaryotic translation elongation factor 1 gamma 1.6 

331752 EOS31683 AA287312 Hs.191648 ESTs 1.6 

330881 EOS30812 AA1 32986 Hs.69321 ESTs; Weakly similar to Simiiiar to mucin and several other Ser-Thr-rich proteins [S.cerevisiae] 1 6 

315647 EOS15578 AA648983 Hs.212911 ESTs 1.6 

25 336766 EOS36697 CH22_4341FG_143_20_ CH22.FGENES. 143-20 1.6 

302592 EOS02523 AA294921 Hs.250811 v-raJ simian leukemia viral oncogene homolog B (ras related; GTP binding protein) 1.6 

315076 EOS15007 AI623817 Hs.1 68457 ESTs 1.6 

337056 EOS36987 CH22_4946FG_441_4_ CH22.FGENES.441-4 16 

322175 EOS22106 AF085975 EST cluster (not in UniGene) 16 

CIpO 336833 EOS36764 CH22_4504FG_242_2_ CH22.FGENES.242-2 16 

334902 EOS34833 CH22_2219FG_452_16_UNK_EMAC005500.GENSCAN.34 1-1 9 

^ CH22_FGENES.452_1 6 1 .6 

hf% 318671 EOS18602 AA1 88823 Hs.2 12621 ESTs 1.6 

= ssa L- 308064 EOS07995 AI469273 Hs.181165 eukaryotic translation elongation factor 1 alpha 1 1.6 

IU35 320559 EOS20490 AB021981 Hs.159322 solute carrier family 35 (UDP-N-acetylgJucosamine (UDP-GlcNAc) transporter); member 3 1.6 

*=$ 317881 EOS17812 AI827248 Hs.224398 ESTs 1.6 

!; 313078 EOS13009 N49730 EST cluster (not in UniGene) 1.6 

C_j 338689 EOS38620 CH22_7464FG_UNK_EMAC005500.GENSCAN.475-3 

~ i ; CH22_EM:AC005500.GENSCAN.475-3 1 6 

= ^40 311804 EOS11735 AA135159 Hs.203349 ESTs 16 

|3 316359 EOS16290 AI472213 Hs.123415 ESTs 16 

330182 EOS30113 c_4_p2 gi|5123954|emb| gn 4 + 120156 120245 ex 2 2 CDSI 4.69 90 11 

s CH.04_p2gi|5123954 1 6 

f =H 334718 EOS34649 CH22.2028FG 421_29JJNK_EM:AC005500.GENSCAN.282-29 

!^45 CH22 FGENES.421_29 1 6 

IQ 324196 EOS24127 AA405524 Hs.178000 ESTs 1.6 

305350 EOS05281 AA706676 EST singleton (not in UniGene) with exon hit 16 

'*\ 331469 EOS31400 N22273 Hs.39140 ESTs 1.6 

305715 EOS05646 AA826884 EST singleton (not in UniGene) with exon hit 1.6 

: =^ : 50 314460 EOS14391 AI263231 Hs.145607 ESTs 16 

r ^ 317634 EOS17565 AA953088 Hs.127550 ESTs 16 

^ 335293 EOS35224 CH22_2635FG_527_6_UNK_EM:AC005500.GENSCAN.421-9 

CH22_FGENES.527_6 1.6 

„ 305611 EOS05542 AA782331 EST singleton (not in UniGene) with exon hit 16 

55 310430 EOS10361 AI670843 Hs.20O257 ESTs 1 6 

323696 EOS23627 AA641201 Hs. 2220 51 ESTs 1 6 

300610 EOS00541 N72596 Hs.99120 DEAO/H (Asp-Glu-AIa-Asp/His) box polypeptide; Y chromosome 1 6 

327364 EOS27295 c_1_hs gj)6552412|ref| gn 2 - 1 15235 1 15396 ex 1 9 CDSI 2.77162 3007 

CH.01_hsgi|6552412 1.6 

60 324848 EOS24779 AW021857 EST cluster (not in UniGene) 16 

321491 EOS21422 H70665 Hs.1 83960 ESTs 16 

336367 EOS36298 CH22_3779FG_818_1 1_UNK_BA232E17.GENSCAN.3-1 7 

CH22.FGENES.818J1 1.6 

331549 EOS31480 N56866 Hs.237507 EST 16 

65 328332 EOS28263 c_7_hs giJ5868375|ref| gn 6 + 280154 280289 ex 3 5 CDS) -1.04 136 516 

CH.07_hs gi|5868375 1.5 

322817 EOS22748 C02420 EST cluster (not in UniGene) 15 

303983 EOS03914 AW514111 Hs.181165 eukaryotic translation elongation factor 1 alpha 1 15 

„ 329434 EOS29365 c_y_hs gij5868883|ref| gn 1 - 31124 31263 ex320 CDSi 6.38140 241 

70 CH.Y_hsgi|5868883 1 5 

338196 EOS38127 CH22_6763FG_UNK_EM:AC005500.GENSCAN.235-16 

CH22_EM:AC005500.GENSCAN.235-16 1 5 

308488 EOS06419 AI682148 Hs. 179661 Homo sapiens done 24703 beta-tubulin mRNA; complete cds 15 

314883 EOS14814 AW178807 Hs.246182 ESTs 15 

75 307095 EOS07026 AM 679 10 EST singleton (not in UniGene) with exon hit 15 

306953 EOS06684 AJ 124971 EST singleton (not in UniGene) with exon hit 15 

331786 EOS31717 AA398539 Hs.97369 EST 15 

303509 EOS03440 AW378236 Hs.2 56050 ESTs 15 

324515 EOS24446 AW501686 Hs.163539 ESTs 15 

80 339323 EOS39254 CH22_8284FG_UNK_BA354l12.GENSCAN.23-2 
CH22_BA354M2.GENSCAN.23-2 1.5 

306563 EOS06494 AA995296 EST singleton (not in UniGene) with exon hit 15 

316076 EOS16007 AW297895 Hs.1 16424 ESTs 15 

325622 EOS25553 c14_hs gi[5867000|ref| gn 2 + 69994 70075 ex 6 8 CDSi 9.40 82 194 

85 CH.14_hsgi|5867000 1 5 

309632 EOS09563 AW 193261 Hs.1 561 10 Immunoglobulin kappa variable 1D-8 15 
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314926 
314458 
335219 

301079 
334122 

308139 
317412 
315073 
313139 
307012 
322895 
303779 
312344 
323632 
332336 
304547 
335692 



EOS14857 
EOS14389 
EOS35150 

EOS01010 
EOS34053 

EOS08070 
EOS17343 
EOS15004 
EOS13070 
EOS06943 
EOS22826 
EOS03710 
EOS12275 
EOS23563 
EOS32267 
EOS04478 
EOS35623 



A1380838 
AI217440 



Hs.124835 
Hs.1 43873 



ESTs 
ESTs 



304143 
329625 



318975 
321875 
320451 
336020 

332581 
338622 



Hs.1 32604 
Hs.257631 



Hs.192152 
Hs.221266 
Hs.1 81 733 

Hs.137551 



328333 EOS28264 



EOS04074 
EOS29556 



329960 EOS29891 



EOS18906 
EOS21806 
EOS20382 
EOS35951 

EOS32512 
EOS38553 



CH22_2558FG_513_2_UNK_EM:AC005500.GENSCAN.406-2 

CH22_FGENES.513_2 
AA305047 Hs. 1 83654 ESTs; Weakly similar to unknown (S.cerevisiae] 
CH22 1400FG 333 3 LINK EM:AC005500.GENSCAN. 185-27 
CH22_FGENES.333_3 
EST singleton (not in UniGene) with exon hit 
ESTs 
ESTs 

EST cluster (not in UniGene) 
EST singleton (not in UniGene) with exon hit 
ESTs 
ESTs 

ESTs; Weakly similar to nitrilase homolog 1 [H.sapiens] 
EST ciuster (not in UniGene) 
ESTs 

EST singleton (not in UniGene) with exon hit 
CH22_3053FG_596_7_UNK_EM:AC005500.GENSCAN.488-7 

CH22_FGENES.596_7 
C_7_hs gj|5868375|rel| gn 6 + 282506 282664 ex 4 5 CDSi 7.71 159 517 

CH.07_hs 91(5868375 
R88737 EST singleton (not in UniGene) with exon hit 

C1 1_p2 gj|4567169|gbjA gn 2 - 85893 85984 ex 3 5 CDSi 2.24 92 29 

CH.11_p2gi|4567169 
c16_p2 gj|5091594|gb|A gn 1 - 1031 1162 ex 1 3 CDSi 10.75 132 415 

CH.16_p2gj|5091594 
Z441 10 EST cluster (not in UniGene) 

N49122 EST cluster (not in UniGene) 

R26944 Hs.1 80777 Homo sapiens mRNA; cONA DKFZp564M0264 (from clone DKFZp564M0264) 
CH22_3403FG_669_9_UNK_DJ32I10.GENSCAN.9.14 

CH22_FGENES.669_9 
T28799 Hs.2913 EphB3 
CH22_7384FG_UNK_EMAC005500.GENSCAN.451 -1 



AI494477 

AI301528 

AW452948 

AA362113 

All 4021 2 

AW470295 

AA897296 

AI742618 

AL039950 

T96130 

AA486189 



335164 EOS35095 



327133 EOS27064 



317460 
332344 
328601 

321677 
331858 
309243 
326213 



EOS17391 
EOS32275 
EOS28732 

EOS21608 
EOS31789 
EOS09174 
EOS26144 



CH22 2500FG_5TJ2_8_UNK_EM:AC00550O.GENSCAN.396-23 

CH22_FGENES.502_8 
c21_hs gi|6682522|ref| gn 1 + 38069 38938 ex 2 2 CDSI 63.42 870 1583 

CH.21_hsgi|6682522 
AA926980 Hs.131347 ESTs 
W45574 Hs.252497 ESTs 

c 7_hsgj|5868321|reflgn1 - 44492 44609 ex 2 3 CDSi 1.71 118 5525 

CH.07_hs gi|5868321 
N44545 Hs.251865 ESTs 
AA421163 Hs.163848 ESTs 

AI972052 EST singleton (not in UniGene) with exon hit 

c17_hs gj|5867224|ref[ gn 3 - 60751 60927 ex 1 4 CDSI 2.06 177 2687 



1.5 
1.5 

1.5 
1.5 

1.5 
1.5 
1.5 
1.5 
1.5 
1.5 
1.5 
1.5 
1.5 
1.5 
1.5 
1.5 

1.5 

1.5 
1.5 

1.5 

1.5 
1.5 
1.5 
1.5 

1.5 
1.5 









CH22_EMAC005500.GENSCAN.45M 


1.5 


330397 


EOS30328 


D14659 Hs.154387 


K1AA0103 gene product 


1.5 


314359 


EOS14290 


AA205569 Hs.1 94 193 


ESTs 


1.5 


313456 


EOS13387 


AW380579 Hs.209657 


ESTs 


1.5 


318486 


EOS18417 


H09123 Hs.139258 


ESTs 


1.5 


318175 


EOS1S106 


AA644624 


EST cluster (not in UniGene) 


1.5 


335684 


EOS35615 


CH22_3045FG_595_4_UNK_EM:AC005500.GENSCAN.487-1 3 










CH22_FGENES.595_4 


1.5 


327814 


EOS27745 


c_5_hs gi|5867968|ref| gn 6 + 69377 70566 ex 1 2 CDSf 86.15 1 190 999 










CH.05 hs gi|5867968 


1.5 


322120 


EOS22051 


W84351 Hs.213846 


ESTs 


1.5 


311749 


EOS 11 680 


R06249 Hs.13911 


ESTs 


1.5 


329797 


EOS29728 


c14_p2 gj|6523160|emb| gn 1 -10616 10894 ex 3 6 CDSi 5.86 279 1549 










CH.14_p2gi|6523160 


1.5 


330630 


EOS30561 


X78669 Hs.79088 


reticulocalbin 2; EF-hand calcium binding domain 


1.5 


303777 


EOS03708 


AA348491 


EST cluster (not in UniGene) with exon hit 


1.5 


309656 


EOS09587 


AW197060 Hs.195188 


glyceraldehyde-3-phosphate dehydrogenase 


1.5 


326165 


EOS26096 


c17_hs gj|5867208|ref| gn 2 - 62787 62929 ex 1 10 CDSI 0.87 143 2037 










CH.17_hs gi|5867208 


1.5 


308328 


EOS08259 


AI590571 Hs.186412 


EST 


1.5 


300601 


EOS00532 


AI762130 Hs.165619 


ESTs 


1.5 


303610 


EOS03541 


AA323288 


EST cluster (not in UniGene) with exon hit 


1.5 


307856 


EOS07787 


AI366158 


EST singleton (not in UniGene) with exon hit 


1.5 


319920 


EOS19851 


R54575 Hs.13337 


ESTs; Weakly similar to similar to Phosphoglucomutase and phosphomannomutase 










phosphoserine [C.elegans] 


1.5 


332167 


EOS32098 


D57389 H3.75447 


ralA binding protein 1 


1.5 


316427 


EOS16358 


AI241019 H3.1 45644 


ESTs 


1.5 


303886 


EOS03817 


AW365963 


EST cluster (not in UniGene) with exon hit 


1.5 


314292 


EOS14223 


AA732590 Hs.134740 


ESTs 


1,5 


315408 


EOS15339 


AW273261 HS.216292 


ESTs 


1.5 


335698 


EOS35629 


CH22.3059FG_597.1_UNK_EM:AC005500.GENSCAN,489-1 










CH22.FGENES.597J 


1.5 


315084 


EOS 1501 5 


AI821085 Hs.187796 


ESTs 


1.5 


302299 


EOS02230 


R64632 Hs.182167 


hemoglobin; gamma A 


1.5 


306603 


EOS06734 


A1055860 Hs.193717 


interteukin 10 


1.5 


315802 


EOS15733 


AA677540 Hs.1 17064 


ESTs 


1.5 


326257 


EOS26168 


c17_hs gj|56672&4|ref| gn 6 ♦ 222712 222819 ex 2 2 CDSI 4.46 108 3597 










CH.17_hs gi|5867264 


1.5 


319599 


EOS19530 


H56112 


EST cluster (not in UniGene) 


1.5 


321891 


EOS21822 


AW157424 Hs.1 65954 


ESTs 


1.5 



1.5 

1.5 
1.5 
1.5 

1.5 
1.5 
1.5 
1.5 
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CH.17_hsgi|5867224 1.5 

321632 EOS21563 AA419617 EST cluster (not in UniGene) 1.5 

321424 EOS21355 AA057301 EST cluster (not in UniGene) 1.5 

322465 EOS22396 AA 137 152 Hs.3784 ESTs; Highly similar to phosphoserine aminotransferase [H.sapiens] 1.5 
333391 EOS33322 CH22_637FG_144_6 UNK_EM;AC005500.GENSCAN.25-6 

CH22_FGENES.144_6 1.5 
333384 EOS33315 CH22_630FG_143_23_UNK_EM:AC005500.GENSCAN.24-17 

CH22_FGENES.143_23 1.5 
334784 EOS3471 5 CH22_2096FG_432_9_UNK_EM:AC005500.GENSCAN.293-12 

CH22_FGENES.432_9 1.5 
334078 EOS34009 CH22 1356FG 327 33 UNK EM:AC00550O.G£NSCAN.181-35 

CH22 FGENES.327.33 1.5 
335158 EOS35089 CH22_2494FG_502 2_UNK_EM:AC005500.GENSCAN.396-17 

CH22_FGENES.502_2 1.5 
335062 EOS34993 CH22_2388FG_482 17 UNK EM:AC005500.GENSCAN.376-16 

CH22 FGENES.482,17 1.5 
333243 EOS33174 CH22_482FG_1 1 1_7_UNK EM^C000097.GENSCAN.120-6 

CH22_FGENES.111_7 1.5 

306380 EOS06311 AA968861 EST singleton (not in UniGene) with exon hit 1.5 

320809 EOS20740 AI540299 EST cluster (not in UniGene) 1.5 

332813 EOS32744 CH22_29FG_8_1_UNK_C65E1.GENSCAN.2-2 

CH22.FGENES.8_1 1.5 
335817 EOS35748 CH22_3189FG_618 5 UNK EM:AC005500.GENSCAN.510-5 

CH22_FGENES.618_5 1.5 

319551 EOS19482 AA761668 EST cluster (not in UniGene) 1.5 

334472 EOS34403 CH22_1771FG_394_3JJNK_EM:AC005500.GENSCAN.257-3 

CH22_FGENES.394_3 1.5 
333029 EOS32960 CH22_255FG_68_3_UNK EM:AC(X>0097.GENSCAN.40-3 

CH22_FGENES.68_3 1.5 

A1468091 Hs. 119252 tumor protein; translationally-controlled 1 1.5 

AW403330 EST cluster (not in UniGene) with exon hit 1 .5 

AA1 67125 EST cluster (not in UniGene) 1.5 

AI932285 Hs.1 60569 ESTs 1 .5 
c10_p2 gi|3983507|gbJA gn 6 - 38025 38143 ex 3 3 CDSi 2.40 1 19 170 

CH.10_p2gii3983507 1.5 
333131 EOS33062 CH22_360FG_83_6_UNK_EM^C000097.GENSCAN.67-10 

CH22_FGENES.83_6 1.5 

AA600353 Hs.173933 ESTs; Weakly simiiar to NUCLEAR FACTOR 1/X [H.sapiens] 1.5 

AA71 4040 EST singleton (not in UniGene) with exon hit 1 .5 

AW291487 Hs.213659 ESTs 1.5 

H09693 EST duster (not in UniGene) 1.5 

AW297758 Hs.249721 ESTs 1.5 

N55158 Hs.135252 ESTs 1.5 

AA42 1 1 60 Hs.9456 SWI/SNF related; matrix associated; actin dependent regulator of chromatin; subfamily a; member 5 1.5 
CH22_2164FG_439_36_UNK_EM:AC005500.GENSCAN.311-13 

CH22_FGENES.439_36 1.5 

AF180919 EST cluster (not in UniGene) 1 .5 

CH22_2677FG_535__6_UNK_EM:AC005500.GENSCAN.426-6 

CH22_FGENES.535_6 1.5 

AJ282468 EST singleton (not in UniGene) with exon hit 1 .5 

AI216473 Hs. 154297 ESTs 1.5 

AA580268 EST cluster (not in UniGene) 1 .5 

c12_hs gi|5666920|ref] gn 9 • 920962 921713 ex 1 8 CDS1 15.95 752 167 

CH.12_hsgi|5866920 1.5 

W75935 Hs.146083 ESTs 1.5 

AI564023 Hs.171467 ESTs; Highly similar to NKG2-0 TYPE II INTEGRAL MEMBRANE PROTEIN [H.sapiens) 1.5 

AA641638 EST singleton (not in UniGene) with exon hit 1.5 

AA099759 EST cluster (not in UniGene) 1 .5 

CH22_2560FG_513_4_UNK_EM:AC005500.GENSCAN.40&4 

CH22_FGENES.513_4 1.5 

AA613107 EST singleton (not in UniGene) with exon hit 1.5 

CH22 2217FG 452J4_UNK_EM:AC005500.GENSCAN.341.17 

CH22_FGENES.452.14 1.5 

Al 654 108 Hs.1 351 25 ESTs 1.5 
CH22 8328FG__UNK_BA354H2.GENSCAN.31-3 

CH22_BA354l12.GENSCAN.31-3 1.5 
C21_hs gi|6531965|refj gn 58 + 4039993 4040096 ex 3 4 CDSi 0.68 104 1284 

CH.21_hsgi|6531965 1.5 
c17_hs gi|5867184|ref| gn 2 • 146342 146469 ex 3 4 CDSi 10.00 128 426 

CH.17_hsgi|5867184 1.5 
C20_hs gj|668251 1 |ref| gn 5 + 1 1 9424 1 1 9500 ex 29 30 CDSi 1 8.89 77 231 3 

CH.20_hsgi|6682511 1.5 
c_7_hs gi|6017031|ref| gn 1 - 35625 35723 ex 4 4 CDSf 5.63 99 5262 

CH.07_hsgi|6017031 1.5 
CH22_6125FG_UNK_EM:AC000097.GENSCAN. 119-11 

CH22_EM:AC000097.GENSCAN. 119-11 1 .5 

AW438602 Hs.191179 ESTs 1.5 

AA340605 Hs.105887 ESTs 1.5 

T52643 EST cluster (not in UniGene) 1 .5 

AF060567 Hs.126782 sushi-repeat protein 1.5 

AA857836 Hs. 181 165 eukaryotic translation elongation factor 1 alpha 1 1.5 

AI929175 Hs.1 19122 ribosomal protein L1 3a 1.5 
CH22 J 465FG_350_1 5_UNK_EM:AC005500.GENSCAN.209-1 7 

CH22_FGENES.350.15 1.5 
3351 88 EOS351 1 9 CH22_2524FG_507_3_UNK_EM:AC005500.GENSCAN.400-3 

CH22_FGENES.507_3 1.5 



308055 
302882 
314033 
324928 
329524 



332085 
305369 
300344 
325071 
323693 
321899 
331857 
334850 

322610 
335332 

307565 
314140 
323011 
325366 

322306 
311034 
305081 
322933 
335221 

304948 
334900 

318404 
339358 

327074 

326054 

326892 

328767 

337772 

312199 
303506 
325176 
302023 
305633 
309131 
334184 



EOS07986 
EOS02813 
EOS13964 
EOS24859 
EOS29455 



EOS32016 
EOS05300 
EOS00275 
EOS25002 
EOS23624 
EOS21830 
EOS31788 
EOS34781 

EOS22541 
EOS35263 

EOS07496 
EOS14071 
EOS22942 
EOS25297 

EOS22237 
EOS10965 
EOS05012 
EOS22864 
EOS35152 

EOS04879 
EOS34831 

EOS18335 
EOS39289 

EOS27005 

EOS25985 

EOS26823 

EOS28698 

EOS37703 

EOS12130 
EOS03437 
EOS25107 
EOS01954 
EOS05764 
EOS09062 
EOS34115 
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304813 EOS04744 AA584540 EST singleton (not in UniGene) with exon hit 1.5 

315359 EOS15290 AA608808 Hs.225118 ESTs 1.5 

324434 EOS24365 AA707249 Hs.98789 ESTs 1.5 

327910 EOS27841 c_6_hs gif58681 62|ref| gn 1 + 21622 21748 ex 6 7 CDSi 3.69 127 449 
5 CH.06_hs Qi|5868162 1.4 

335671 EOS35602 CH22 3031 FG 592 3 UNK EM:AC005500.GENSCAN.485-4 

CH22_FGENES.592_3 1.4 

334943 EOS34874 CH22_2264FG_465_8_UNK_EM:AC0O5500.GENSCAN.359-8 

, _ CH22_FGENES.465 8 1.4 

10 326393 EOS26324 c19_hs gi|5867341|ref| gn 2 + 41702 41841 ex 5 5 CDSi 20.15 140 504 

CH.19_hsgi|5867341 1.4 

305296 EOS05227 AA687181 EST singleton (not In UniGene) with exon hit 1.4 

307243 EOS07174 AI199957 EST singleton (not in UniGene) with exon hit 1.4 

320066 EOS19997 AW364885 Hs.1 12442 ESTs 1.4 

15 311465 EOS11396 AJ758660 Hs.206132 ESTs 1.4 

302822 EOS02753 AW404176 Hs.1 11611 ribosomal protein L27 1.4 

304987 EOS04918 AA618044 EST singleton (not in UniGene) with exon hit 1.4 

330892 EOS30823 AA149579 Hs.1 18258 ESTs 1.4 

„ 333385 EOS33316 CH22_631FG.143_24_UNK_EMAC005500.GENSCAN.24- 18 

20 CH22_FGENES.143_24 1.4 

302626 EOS02557 AB021870 EST cluster (not in UniGene) with exon hit 1.4 

318042 EOS17973 AW294522 Hs.149991 ESTs 1.4 

339361 EOS39292 CH22_8331 FG_UNK_BA354l12.GENSCAN.32-3 

CH22_BA354l12.GENSCAN.32-3 1.4 

25 309000 EOS08931 A1880489 EST singleton (not in UniGene) with exon hit 1.4 

306004 EOS05935 AA889992 EST singleton (not in UniGene) with exon hit 1.4 

329539. EOS29470 c10_p2 gi|3983503|gb|U gn 1 - 1 326 ex 1 3 CDSI 41.66 326 212 

CH.10__p2 gi|3983503 1.4 

•*-»/\ 313663 EOS13594 AI953261 Hs.169813 ESTs 1.4 

l_30 323538 EOS23469 AW247696 EST cluster (not in UniGene) ' 1.4 

. ?* 337595 EOS37526 CH22 5884FG_UNK_C20H12.GENSCAN.8-1 

* w CH22_C20H 1 2.GENSCAN.8- 1 1 .4 

*p 303149 EOS03080 AA312995 EST cluster (not in UniGene) with exon hit 1.4 

s = 308484 EOS08415 AI679292 EST singleton (not in UniGene) with exon hit 1.4 

^•=35 300912 EOS00843 AW138724 Hs.168974 ESTs 1.4 

£1 315158 EOS15089 AA744438 Hs.142476 ESTs; Weakly similar to !!!! ALU CLASS D WARNING ENTRY HI! [H. sapiens] 1.4 

300462 EOS00393 AA746501 Hs.1 42 17 ESTs 1.4 

■W 312730 EOS12661 AI804372 Hs.208661 ESTs 1.4 

?s| 316868 EOS16799 A1660898 Hs.195602 ESTs 1.4 

1^40 337629 EOS37560 CH22_5933FG_UNK_C20H12.GENSCAN.28-35 

|j CH22_C20H12.GENSCAN.28-35 1.4 

332518 EOS32449 D16562 Hs.1 55433 ATP synthase; H+ transporting; mitochondrial F1 complex; gamma polypeptide 1 1.4 

337422 EOS37353 CH22_5624FG_760_2_ CH22.FGENES. 760-2 1.4 

£3 328835 EOS28766 c_7_hs gi| 5866339 jref] gn 5 + 88053 88461 ex 3 3 CDS! 13.78 409 5775 

j^45 CH.07_hs gi|5868339 1.4 

? »W 338282 EOS38213 CH22_6897FG_UNK_EMAC005500.GENSCAN.291-4 

h& CH22_EMAC005500.GENSCAN.291-* 1.4 

" ~ 337895 EOS37826 CH22_6303FG_UNK_EM:AC005500.GENSCAN.56-2 

=h= - ^ CH22_EMAC005500.GENSCAN.56-2 1 .4 

^50 320330 EOS20261 AF026004 Hs.141660 chloride channel 2 1.4 

314302 EOS14233 AA813118 Hs.163230 ESTs 1.4 

313280 EOS13211 AI285537 Hs.222830 ESTs 1.4 

333222 EOS33153 CH22_459FGJ05_2_UNK_EM:AO)00097.GENSCAN. 109-6 

_ CH22.FGENES.1 05_2 1.4 

55 305726 EOS05657 AA828156 EST singleton (not in UniGene) with exon hit 1.4 

312674 EOS12605 AJ762475 Hs.1 51 327 ESTs; Moderately similar to INI ALU SUBFAMILY J WARNING ENTRY till [H. sapiens] 1.4 

315869 EOS15800 AI033547 Hs.132826 ESTs 1.4 

327010 EOS26941 c21_hs gi|5867664|ref| gn 12 + 941057 941 139 ex 9 9 CDSI 7.44 83 790 

CH.21_hs gi|5867664 1.4 

60 325892 EOS25823 d6_hs gi|5667088|ref| gn 1 - 10498 10652 ex 2 3 CDSi 3.94 155 870 

CH.16_hsgi|5867088 1.4 

302575 EOS02506 AF071164 Hs.249171 homeoboxA11 1.4 

301970 EOS01901 AB028962 Hs.1 20245 KIAA1 039 protein 1.4 

332207 EOS32138 H61475 Hs.237353 EST 1.4 

65 316024 EOS15955 AA707141 Hs.193388 ESTs 1.4 

314599 EOS14530 AW206512 Hs.1 86996 ESTs 1.4 

333585 EOS3351 6 CH22_846FG_203_4_UNK_EM:AC005500.GENSCAN.74-6 

CH22_FGENES.203_4 1.4 

324670 EOS24601 AI525557 EST cluster (not in UniGene) 1.4 

70 321307 EOS21238 R85409 EST cluster (not in UniGene) 1.4 

335170 EOS35101 CH22_2506FG_503_1_UNK_EM:AC005500.GENSCAN. 397-1 

CH22_FGENES.503_1 1.4 

328274 EOS28205 c_7_hs gi| 58682 19|ref| gn 2 - 31244 31439 ex 1 1 1 CDS1 13.06 196 9 

CH.07_hsgii5868219 1.4 

75 336880 EOS36811 CH22_4619FG_318_8_ CH22_FGENES.318-8 1.4 

313825 EOS13756 AA215470 EST cluster (not in UniGene) 1.4 

318410 EOS18341 A1 1384 18 Hs.1 44935 ESTs 1.4 

335361 EOS35292 CH22_2710FG_541_11_UNK_EM:AC005500.GENSCAN.431-16 

„ CH22_FGENES.541 11 1.4 

80 319802 EOS19733 AI701489 Hs.202501 ESTs 1.4 

334769 EOS34700 CH22.2081 FG_429_4_UNK_EM:AC005500.GENSCAN.290-9 

CH22.FGENES.429.4 1.4 

312709 EOS12640 AW069181 Hs.141146 ESTs; Weakty similar to tiansformation-related protein [H.sapiens] 1.4 

330004 EOS29935 Cl6_p2 gi|6623963|gb|A gn 5 - 78872 78999 ex 2 6 CDSi 19.93 128 728 

85 CH.16_p2gi|6623963 1.4 

313103 EOS 13034 AI184303 Hs.1 43806 ESTs 1.4 
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326359 EOS26290 c18.hs gi|5867293|refl gn 1 + 9436 9494 ex 2 3 CDSi 2.16 59 88 

CH.l8_hsgi|5867293 1.4 
305211 EOS05142 AA668563 EST singleton (not in UniGene) with exon hit 1.4 

334628 EOS34559 CH22 1936FG 416 4 LINK EM:AC005500.GENSCAN.277-4 
5 CH22.FGENES.416.4 1.4 

326919 EOS26850 c21 hs gi| 6456782 |refj gn 2 - 40486 41046 ex 1 5 COS1 17.70 561 157 

CH.21_hsgi|6456782 1.4 
315527 EOS15458 AI791138 Hs. 116768 ESTs 1.4 
306090 EOS06021 AA908609 EST singleton (not in UniGene) with exon hit 1.4 

10 303316 EOS03247 AF033122 Hs.14125 p53 regulated PA26 nuclear protein 1.4 
303642 EOS03573 AW299459 EST cluster (not in UniGene) with exon hit 1.4 

314357 EOS14288 AA781795 Hs.122587 ESTs 1.4 

337102 EOS37033 CH22_5033FG_472_7_ CH22.FGENES.472-7 1.4 

, _. 304384 EOS04315 AA235482 Hs.62954 ferritin; heavy polypeptide 1 1.4 

15 315117 EOS15048 AA828609 Hs. 192044 ESTs 1.4 

305750 EOS05661 AA835250 EST singleton (not in UniGene) with exon hit 1.4 

311726 EOS 11 657 AW081766 Hs.253920 ESTs 1.4 
326996 EOS26927 C21 hs gi| 58 67660 |ref| gn 4 - 63212 63404 ex 2 6 COSi 15.70 193 622 

CH.21_hs gj|5867660 1.4 
20 330257 EOS301B8 C 5_p2 gi| 6671 881 |gb| A gn 2 - 143228 143393 ex 1 9 CDS1 11.31 166 586 

CH.05_p2gii66718ai 1.4 

323864 EOS23795 AA340724 Hs.214028 ESTs 1.4 
338204 EOS381 35 CH22_6773FG__UNK_EM:AC005500.GENSCAN.241 -3 

CH22_EM:AC005500.GENSCAN.241 -3 1 .4 

25 314025 EOS13956 AI983981 Hs.189114 ESTs 1.4 

315974 EOS15905 AW029203 Hs.191952 ESTs 1.4 
335599 EOS35530 CH22_2957FG_581_39_UNK_EM:AC005500.GENSCAN.476-37 

CH22_FGENES.581_39 1.4 
335364 EOS35295 CH22.271 3FG_543_2_UNK_EM:AC005500.GENSCAN.432-4 

? *0 0 CH22_FGENES.543__2 1 .4 

-_f 303634 EOS03565 AI953377 Hs. 169425 ESTs; Weakly similar to predicted using Genefinder [C.etegans] 1.4 

315626 EOS15557 AA808598 Hs.35353 ESTs; Weakty similar to H21 P03.2 [C.elegans] 1.4 
^ 329936 EOS29867 c16_p2 gi|6165200|gb|A gn 4 - 82761 82920 ex 3 4 CDSi 1.15 160 199 

CH.16_p2 gt|6165200 1.4 
1 : 35 328632 EOS28563 c_7_hs gi|5868247|ref| gn 1 + 76734 76853 ex 1 4 CDSf 13.95 120 3764 

:^ CR07_hs gi|5868247 1.4 

Q 330207 EOS30138 c_5_p2 gi|6013606|gb|A gn 3 - 109912 1 10004 ex 2 4 CDSi 6.54 93 174 

CH.05_p2gi|6013606 1.4 
ZTi _ - 329919 EOS29850 c16_p2 gi|6223624|gb|A gn 6 - 103492 103681 ex 1 8 CDSI 6.18 190 93 

sy40 CH.16_p2gi|6223624 1.4 

*=t 331916 EOS31847 AA446131 Hs.124918 ESTs 1.4 

317617 EOS17548 T58194 EST cluster (not in UniGene) 1.4 

. 331943 EOS31874 AA453418 Hs.178272 ESTs 1.4 

306413 EOS06344 AA973288 EST singleton (not in UniGene) with exon hit 1.4 

^45 313607 EOS13538 N94169 Hs. 194258 ESTs; Moderately similar to (111 ALU SUBFAMILY SC WARNING ENTRY !!!! [H. sapiens] 1.4 
ffk 336292 EOS36223 CH22 3691FG_783_3_UNK_BA354112.GENSCAN.4-7 

: c ~ CH22_FGENES.783_3 1.4 

H 330453 EOS30384 HG3976-HT4246 Poo- Domain Dn a Binding Factor Pit1 , Pituitary-Specific 1.4 

~- _ 324602 EOS24533 AA503620 Hs.213239 ESTs 1.4 

^50 332183 EOS32114 H08225 Hs.177181 ESTs 1.4 

Q 320032 EOS19963 A1699772 Hs. 2 02361 ESTs; Weakly similar to X -Jinked retinopathy protein [H.sapiens] 1.4 

333156 EOS33087 CH22 387FG 89_6_UNK EM:AC000097.GENSCAN.84-8 

CH22_FGENES.89_6 1.4 
334156 EOS34087 CH22_1435FG_340_6_UNK_EM:AC005500.GENSCAN. 190-7 

5 5 CH22_FGENES.340_6 1 .4 

334303 EOS34234 CH22.1 594FG_373_6JJNK_EM:AC005500.GENSCAN.233-5 

CH22_FGENES.373_6 1.4 
325513 EOS25444 c12_hs 0(601 7035]ref] gn 1 - 34295 34490 ex 2 7 CDSi 6.49 196 2471 

^ n CH.12.hsgiI6017035 1.4 

60 302758 EOS02689 AA984563 EST cluster (not in UniGene) with exon hit 1.4 

329557 EOS29488 c10_p2 gi|3962492|gb|A gn 6 • 53197 53647 ex 2 2 CDSf 37.68 451 247 

CH.10_p2gi|3962492 1.4 

331717 EOS31648 AA190888 Hs. 153881 ESTs; Highly similar to NY-REN-62 antigen (H.sapiens] 1.4 
„ 325885 EOS25816 C16 hs gi|5867087|ref| gn 11 + 193212 193377 ex 1 3 CDSt 43.19 166 792 

65 CH.16_hsgi|5867087 1.4 

312160 EOS12091 AA805903 Hs. 184371 ESTs 1.4 
328882 EOS28813 c 7 hs gij 6552423) ref| gn 2 - 157669 157826 ex 4 6 CDSi 4.91 158 6200 

CH.07_hs gi|6552423 1.4 

„ 339028 EOS38959 CH22_7925FG_UNK_DA59H18.GENSCAN.22-8 

70 • CH22_DA59H18.GENSCAN.22-8 1.4 

323497 EOS23428 Al 52 36 13 Hs.221544 ESTs 1.4 

316897 EOS16828 AA838114 EST cluster (not in UniGene) 1.4 

312479 EOS 12410 A1950844 Hs. 128736 ESTs; Weakly similar to non-lens beta gamma-crystallin like protein [H.sapiens] 1.4 
338535 EOS38466 CH22.7251 FG__UNK_EM:AC005500.GENSCAN.404-3 

75 CH22_EM;AC005500.GENSCAN.404-3 1.4 

312754 EOS12685 R99834 Hs .250383 ESTs 1.4 
327527 EOS27458 c 2 hs gi|6381882|ref| gn 2 • 98950 99040 ex 4 8 CDSi 5.78 91 1768 

CH.02_hsgj[6381882 1.4 

324714 EOS24645 AA574312 Hs.245737 ESTs 1.4 

80 302347 EOS02278 AF039400 Hs. 194659 chloride channel; calcium activated; family member 1 1.4 
338008 EOS37939 CH22_6490FG_UNK_EM:AC005500.GENSCAN. 127-9 

CH22 EM:AC005500.GENSCAN.127-9 1.4 

315590 EOS15521 AA640637 Hs. 22 58 17 ESTs 1.4 

320S25 EOS20756 NM.004751 EST cluster (not in UniGene) 1.4 

85 300930 EOS00861 AI289481 Hs. 136371 ESTs 1.4 
335225 EOS351 56 CH22_2564FG_51 3 J 0JJNK.EM AC005500.GENSCAN.406-9 
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CH22_FGENES.513_10 1.4 

337303 EOS37234 CH22_5442FG_681_5_ CH22_FGENES.681-5 1.4 

317198 EOS17129 AI 8 10384 Hs. 128025 ESTs 1.4 

308991 EOS08922 A1879831 EST singleton (not in UniGene) with exon hit 1.4 

5 325472 EOS25403 c12 hs gi|6017034|ref| gn 7 - 289581 289657 ex 2 6 CDSi 4.74 77 1786 

CH.12_hs gi|6017034 1.4 

301266 EOS01197 AA829774 EST cluster (not in UniGene) with exon hit 1.4 

330901 EOS30832 AA 1578 18 Hs.238380 Human endogenous retroviral protease mRNA; complete cds 1.4 

313406 EOS13337 AJ248314 Hs. 132932 ESTs 1.4 

10 301454 EOS01385 AJ 75 1738 EST cluster (not in UniGene) with exon hit 1.4 

317269 EOS17200 AA906411 Hs.127378 ESTs 1.4 

338876 EOS38807 CH22_7733FG_UNK DJ32J 1 0.GENSCAN.4-2 

CH22_DJ32J10.GENSCAN.4-2 1 .4 

328481 EOS28412 c_7 hs gi| 5868449] ref] gn 1 • 8987 9180 ex 4 31 CDSi 10.00 194 2103 

15 CH.07_hs gt|5868449 1.4 

314022 EOS13953 AW452420 Hs.248678 ESTs 1.4 

307640 EOS07571 AJ 30 1992 EST singleton (not in UniGene) with exon hit 1.4 

315541 EOS15472 AJ 168233 Hs. 1231 59 ESTs; Weakly similar to KIAA0668 protein [Ksapiens] 1.4 

__ 315489 EOS15420 AA628245 Hs.191847 ESTs 1.4 

20 327815 EOS27746 c_5_hs gi|5867968|ref| gn 6 + 70804 71401 ex 2 2 CDSI 27.99 598 1 000 

CH.05_hsgi|5867968 1.4 

339319 EOS39250 CH22_8280FG_UNK_BA354I12.GENSCAN.22- 19 

CH22_BA354I12.GENSCAN.22-19 1 .4 

322564 EOS22495 W86440 Hs. 11 8344 ESTs 1.4 

25 323812 EOS23743 AW081373 Hs.199199 ESTs 1.4 

303540 EOS03471 AA355607 Hs.1 73590 ESTs; Weakly similar to MMSET type I [Rsapiens] 1.4 

337902 EOS37833 CH22_6314FG_UNK_EM:AC005500.GENSCAN.56-13 

CH22_EM:AC005500.GENSCAN.56-13 1 .4 

rt 335289 EOS35220 CH22_2631FG_527_2_UNK_EMJVC005500.GENSCAN.421 -2 

^30 CH22_FGENES.527_2 1.4 

327919 EOS27850 c_6_hs gi|5868165j ref] gn 6 + 547701 547800 ex 14 14 CDSI -0.20 100 505 

CH.06_hs gi|5868165 1.4 

''Z 337674 EOS37605 CH22_6005FG_UNK_EM:AC000097.GENSCAN.67-4 

\Lj CH22_EM:AC000097.GENSCAN.67-4 1.4 

? 3 35 320087 EOS20018 AF032387 Hs.113265 small nuclear RNA activating complex; polypeptide 4; 190kD 1.4 

r !r 334939 EOS34870 CH22_2259FG_465_3_UNK_EM:AC005500.GENSCAN.359-3 

Q CH22_FGENES.465_3 1.3 

f*i 303443 EOS03374 AA320525 EST duster (not in UniGene) with exon hit 1.3 

^ 325929 EOS25860 c16_hs gi|5867125|refj gn 2 - 51715 51996 ex 1 1 CDSo 29.05 282 1594 

PjAO CH,16_hsgi|5867125 1.3 

327745 EOS27676 c_5_hs gi|6531959|ref| gn 1 - 229066 229124 ex 3 6 CDSi 3.01 59 177 

Wi CH.05_hsgi|6531959 1.3 

- 335166 EOS35097 CH22_2502FG_502_10_UNK_EM:AC005500.GENSCAN.396-25 

* _ CH22_FGENES.502_10 1.3 

M45 324497 EOS24428 AW1 52624 Hs.1 36340 ESTs 1 .3 

?^ 338374 EOS38305 CH22 7017FG_UNK EM:AC0Q5500.GENSCAN.327-1 

^ CH22_EM:AC005500.GENSCAN.327-1 1.3 

^ 313601 EOS 13532 R32458 Hs.257711 ESTs 1.3 

^ 321415 EOS21346 AI377596 Hs.3337 transmembrane 4 superfamily member 1 1.3 

*H50 305309 EOS05240 AA699717 EST singleton (not in UniGene) with exon hit 1.3 

f n 330447 EOS30378 HG3546-HT3744 Pre-Mma Splicing Factor Sf2p33, Alt. Splice Form 1 1.3 

7 s * 308578 EOS08509 AI708573 EST singleton (not in UniGene) with exon hit 1.3 

315344 EOS15275 AW292176 Hs.245834 ESTs 1.3 

330503 EOS30434 M 55024 Human cell surface glycoprotein P3.58 mRNA, partial cds 1.3 

55 308227 EOS08158 AJ559126 Hs.1 951 88 glyceraldehyde-3-phosphate dehydrogenase 1.3 

332222 EOS32153 N28271 Hs.176618 ESTs 1.3 

323961 EOS23892 AL044428 Hs.207345 ESTs 1 .3 

314530 EOS14461 AJ 052358 Hs.1 31741 ESTs 1.3 

320503 EOS20434 NM.005897 EST cluster (not in UniGene) 1.3 

60 306820 EOS06751 AI074408 EST singleton (not in UniGene) with exon hit 1.3 

304165 EOS04096 H73265 EST singleton (not in UniGene) with exon hit 1.3 

324302 EOS24233 AA543008 Hs.1 36806 ESTs; Weakly similar to 111! ALU SUBFAMILY J WARNING ENTRY 111! [H. sapiens) 1.3 

319128 EOS19059 AA393820 EST cluster (not in UniGene) 1.3 

317092 EOS17023 AI286162 Hs.125657 ESTs 1.3 

65 304998 EOS04929 AA621203 EST singleton (not in UniGene) with exon hit 1.3 

331433 EOS31364 H68097 Hs.161023 EST 1.3 

333348 EOS33279 CH22 594FG.140 2 UNK_EM:AC005500.GENSCAN.20-2 

CH22_FGENES.140__2 1.3 

__. 333619 EOS33550 CH22_880FG_219_3_UNK_EM:AC005500.GENSCAN.87-2 

70 CH22_FGENES.219_3 1.3 

335903 EOS35834 CH22_3280FG_635_1 1_UNK_EM:AC005500.GENSCAN.525*14 

CH22_FGENES.635_11 1.3 

326219 EOS26150 c17 hs gi|5867226|ref| gn 1 1 - 264008 264274 ex 3 5 CDSi 5.74 267 2847 

CH.17_hsgil5867226 1.3 

75 324456 EOS24387 AW500954 EST duster (not in UniGene) 1.3 

316405 EOS16336 AA757900 Hs.202624 ESTs 1.3 

314361 EOS14292 AL038765 Hs.161304 ESTs 1.3 

328546 EOS28477 c_7_hs gi|5868487|refl gn 1 -17547 17722 ex 2 3 CDSi 9.96 176 3284 

CH.07_hs gi|5868487 1.3 

80 335871 EOS35802 CH22_3246FG_629_19_UNK_EM:AC005500.GENSCAN.519-18 

CH22_FGENES.629_19 1.3 

303735 EOS03666 AA707750 Hs.202616 ESTs; Weakly similar to cis-Golgi matrix protein GM130 [R.norvegicus] 1.3 

324048 EOS23979 AA378739 EST cluster (not in UniGene) 1.3 

326720 EOS26651 c20_hs gi|6552456|ref] gn 1 + 84525 84677 ex 5 7 CDSi 1 1.78 153 1031 

85 CH.20_hsgi|6552456 1.3 

322309 EOS22240 AF086372 EST cluster (not in UniGene) 1.3 
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322136 
313460 
306275 
321974 
327600 



336919 
302767 
334786 

302472 
333033 

330493 
330506 
313932 
314394 
323033 
326431 



300548 
316504 
335756 

301209 
306610 
314439 
315396 
335914 



EOS22067 
EOS13391 
EOS06206 
EOS21905 
EOS27531 



329086 EOS29017 



EOS36850 
EOSQ2698 
EOS34717 

EOSQ2403 
EOS32964 

EOS30424 
EOS30437 
EOS13863 
EOS14325 
EOS22964 
EOS26362 



335547 EOS35478 



EOS00479 
EOS16435 
EOS35687 

EOS01140 
EOS06541 
EOS14370 
EOS15327 
EOS35845 



333734 EOS33665 



312370 
304636 
323166 
338702 

322331 
318706 
331186 
334764 

327565 

335524 

308050 
334172 

315674 
334876 

315606 
338779 

333511 

329254 

319510 
339418 

321012 
333217 

338561 

335742 

334993 

323430 
306069 
331681 
337986 

313204 
323189 
318171 
307156 
332713 
312828 



EOS12301 
EOS04567 
EOS23097 
EOS38633 

EOS22262 
EOS18637 
EOS31117 
EOS34695 

EOS27496 

EOS35455 

EOS07981 
EOS34103 

EOS 15605 
EOS34807 

EOS15537 
EOS38710 

EOS33442 

EOS29185 

EOS19441 
EOS39349 

EOS20943 
EOS33148 

EOS38492 

EOS35673 

EOS34924 

EOS23361 
EOS06000 
EOS31612 
EOS37917 

EOS13135 
EOS23120 
EOS18102 
EOS07087 
EOS32644 
EOS12759 



M27626 

M61906 

A1147601 

A1380563 

AI744284 



Hs.238380 

HS.6241 

Hs.154087 

Hs.130816 

Hs.221727 



AI809912 
AI000635 
AI539443 
AW296107 



AF075083 EST cluster (not in UniGene) 1.3 

AW028655 Hs. 136033 ESTs 1.3 

AA93631 2 EST singleton (not in UniGene) with exon hit 1 .3 

N76794 EST cluster (not in UniGene) 1 .3 

c 3 hs gi|6004462|ref| gn 1 - 2621 2862 ex 1 4 CDS) -4.01 242 1407 

CH.03_hs gi|6004462 1.3 
c__x hs gi|5868604jref| gn 1 - 35489 35588 ex 2 9 COSi 2.55 100 719 

CH.X_hsgi|5868604 1.3 

CH22_4690FG_346_6_ CH22_FGENES.346-6 1.3 

H94900 Hs.17882 ESTs 1.3 
CH22_2098FG_432_1 1_UNK_EM AC005500.GENSCAN.293-1 4 

CH22.FGENES.432J1 1.3 

AA317451 Hs.241451 SWI/SNF related; matrix associated; actin dependent regulator of chromatin; subfamily e; member 1 1.3 
CH22_259FG_68_8_LINK_EM:AC000097.GENSCAN.40-8 

CH22_FGENES.68_8 1.3 

Human endogenous retroviral protease mRNA; complete cds 1.3 

phosphoinositide-3-kinase; regulatory subunit; polypeptide 1 (p85 alpha) 1 .3 

ESTs 1.3 

ESTs 1.3 

ESTs 1.3 
c19_hsgi|5867371|ref| gn 1 + 15855 15971 ex4 6CDSi 7.79 117 1108 

CH.19_hsgi|5867371 1.3 
CH22_2902FG_576_8_UNK_EM:AC005500.GENSCAN.467-8 

CH22_FGENES.576_8 1.3 

AI026836 Hs.1 1 4689 ESTs 1 .3 

AW1 35854 Hs. 1 32458 ESTs 1 .3 
CH22_3123FG_604_5_UNK_EM;AC005500.GENSCAN.493-10 

CH22_FGENES.604_5 1.3 

Hs.1 59354 ESTs 1.3 

EST singleton (not in UniGene) with exon hit 1 .3 

Hs.1 37447 ESTs 1.3 

Hs.1 52686 ESTs 1.3 
CH22_3291FG_636_10_UNK_EMAC005500.GENSCAN.526-10 

CH22_FGENES.636.10 1.3 
CH22_1000FG_260_2_UNK_EM:AC005500.GENSCAN.119-7 

CH22_FGENES.260_2 1.3 

AA744692 Hs.166539 ESTs 1.3 

AA524031 EST singleton (not in UniGene) with exon hit 1 .3 

AA291001 EST cluster (not in UniGene) 1.3 

CH22_7482FG_UNK_EM:AC005500.GENSCAN.480-1 

CH22_EM:AC005500.GENSCAN.480-1 1.3 

AF086467 EST cluster (not in UniGene) 1 .3 

A1383593 Hs. 1 591 48 ESTs 1 .3 

T41159 Hs.8418 ESTs 1.3 
CH22_2076FG_428J3_UNK_EM:AC005500.GENSCAN.289-13 

CH22_FGENES.428 13 1.3 
c_3_hs gi|586781 1 |ref| gn 1 +32516 32778 ex 2 3 CDS! 0.20 263 368 

CH.03_hsgt|5867811 1.3 
CH22.2879FG 572 4 UNK_EM:AC005500.GENSCAN.46M 

CH22_FGENES.572_4 1.3 

A1460004 EST singleton (not in UniGene) with exon hit 1 .3 

CH22 1452FG 349 5 LINK EM:AC005500.GENSCAN.208-6 

CH22_FGENES.349 5 1.3 

AA651923 Hs.1 91 850 ESTs 1.3 
CH22 2190FG 450 6 LINK EM:AC005500.GENSCAN.339-6 

CH22_FGENES.450_6 1.3 

AW298724 Hs.202639 ESTs 1.3 
CH22 7610FG_UNK EM:AC005500.GENSCAN.526-15 

CH22_EM:AC005500.GENSCAN.526-15 1.3 
CH22 766FG 171 5 UNK EM:AC005500.GENSCAN.51-5 

CH22_FGENES.171_5 1.3 
c x_hs gi|5868733|ref| gn 1 + 4133 4214 ex 1 2 CDSi -0.36 82 2833 

CH.X_hs gi|5868733 1.3 

W88633 Hs.254562 ESTs 1.3 
CH22 8411FG_UNK DJ579N16.GENSCAN.11-4 

CH22_DJ579N16.GENSCAN.11-4 1.3 

AA737314 EST cluster (not in UniGene) 1.3 

CH22 454FG 104.9 UNK_EM:AC000097.GENSCAN.108^ 

CH22 FGENES.104 9 1.3 
CH22 7294FG_UNK_EM:AC005500.GENSCAN.421-5 

CH22.EM: AC005500.GENSCAN.42 1-5 1.3 
CH22 3105FG 601 13 UNK EM:AC005500.GENSCAN.491-14 

CH22_FGENES.601.13 1.3 
CH22_2314FG_469_14_UNK_EM:AC005500.GENSCAN.365-18 

CH22.FGENES.469J4 1.3 

AW062479 EST cluster (not in UniGene) 1 .3 

AA906983 EST singleton (not in UniGene) with exon hit 1.3 

W85712 Hs.1 19571 collagen; type III; alpha 1 (Ehlers-Danlos syndrome type IV; autosomal dominant) 1.3 
CH22.6441 FG_UNK EM:AC005500.GENSCAN. 1 1 0-7 

CH22_EM:AC005500.GENSCAN.1 10-7 1 .3 

ESTs 1.3 

ESTs 1.3 

EST cluster (not in UniGene) 1 .3 

EST singleton (not in UniGene) with exon hit 1 .3 

mutY (E. coli) homolog 1.3 

ESTs; Moderately similar to !!!! ALU SUBFAMILY J WARNING ENTRY I!! I [H.sapiens] 1.3 



AI800518 
AL121194 
AA381202 
All 66762 
AA349792 
A1865455 



Hs.1 18158 
Hs.120589 



Hs.78489 
Hs.211818 
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337904 EOS37835 



329347 EOS29278 



313329 
314367 
317098 
306462 
301254 
335504 



304254 
305731 
323284 
322007 
334537 

302360 
311641 
324643 
327554 

312165 
304679 
319564 
310860 
337161 
311155 
336846 
310985 
329499 



330861 
324658 
323362 
330468 
314196 
339436 

312483 
321505 
332254 
328253 

332357 
329017 

337504 
316625 
335389 

310017 
314354 
324641 
335207 

333673 

334370 

328690 

323208 
307010 
316563 
312219 
319884 
334720 



305448 
314885 
320130 
310567 
323898 



EOS13260 
EOS14298 
EOS17029 
EOS06393 
EOS01185 
EOS35435 



334270 EOS34201 
334324 EOS34255 



EOS04185 
EOS05662 
EOS23215 
EOS21938 
EOS34468 

EOS02291 
EOS11572 
EOS24574 
EOS27485 

EOS12096 
EOS04610 
EOS19495 
EOS10791 
EOS37092 
EOS11086 
EOS36777 
EOS10916 
EOS29430 



334924 EOS34855 



EOS30792 
EOS24589 
EOS23293 
EOS30399 
EOS14129 
EOS39367 

EOS 1241 4 
EOS21436 
EOS32185 
EOS28184 

EOS32288 
EOS28948 

EOS37435 
EOS16556 
EOS35320 

EOS09946 
EOS14285 
EOS24572 
EOS35138 

EOS33604 

EOS34301 

EOS28621 

EOS23139 
EOS06941 
EOS16494 
EOS12150 
EOS19815 
EOS34651 



335836 EOS35767 



EOS05379 
EOS14816 
EOS20061 
EOS 10498 
EOS23829 



AW293704 

AA535749 

AI123513 

AA983397 

A1049624 



AA046273 
AA829363 
AA279381 
AW410646 



AW292139 
AA548741 
AA026777 
AW015920 



Hs.1 15789 



Hs.169732 
Hs.161359 



CH22_5180FG_561_3_ 
A1634410 Hs. 197608 
CH22_4540FG_263_5_ 
T51842 



AA758109 Hs.121072 ESTs 
AI672509 Hs.196582 ESTs 
CH22_7007FG_UNK_EM:AC005500.GENSCAN.323-7 

CH22_EM:AC005500,GENSCAN.323-7 
CH22.631 8FG_UNK_EM:AC005500.GENSCAN.56-1 7 

CH22_EM:AC005500.GENSCAN.56-1 7 
c x hs gi|6456785|refj gn 1 + 1 8433 1 6897 ex 4 4 CDSI 43.39 465 371 8 
CH.X hs gj|6456785 
Hs.122658 ESTs 

EST duster (not in UniGene) 
Hs.125456 ESTs 

EST singleton (not in UniGene) with exon hit 
EST duster (not in UniGene) with exon hit 
CH22_2856FG_571J5_UNK_EM:AC005500.GENSCAN.460-34 

CH22 FGENES.571J5 
CH22_1559FG_368_2_UNK_EM:AC005500.GENSCAN.228-3 

CH22_FGENES.368_2 
CH22_1616FG_375_1JJNK_EM:AC005500.GENSCAN.235-1 
CH22_FGENES.375_1 
Hs. 1 1 1 334 ferritin; light polypeptide 

EST singleton (not in UniGene) with exon hit 
Hs.190010 ESTs 
Hs.165739 ESTs 
CH22_1839FG_403_2_UNK_EM:AC005500.GENSCAN.268-2 

CH22_FGENES.403_2 
AJ010901 Hs. 198267 mucin 4; tracheobronchial 
AI946829 Hs.213786 ESTs 
AI436356 Hs.130729 ESTs 

c_3_hs gij5867801 jref| gn 2 - 23092 23191 ex 2 6 CDSi 10.44 100 107 
CR03_hs gj|5867801 
ESTs 

EST singleton (not in UniGene) with exon hit 
ESTs 
ESTs 

CH22_FGENES.561-3 
EST 

CH22.FGENES.263-5 
EST duster (not in UniGene) 
c10j>2 gi|3983518|gb|A gn 5 + 33463 33789 ex 1 1 CDSo 34.50 327 97 

CH.10_p2gi|39835ie 
CH22_2244FG_459_2_UNK_EM:AC005500.GENSCAN.351-2 

CH22_FGENES.459_2 
AA084064 Hs.185747 ESTs 
A1694767 Hs.129179 ESTs 
AL135067 Hs.1 17182 ESTs 

L10343 Hs.1 12341 protease inhibitor 3; skin-derived (SKALP) 
AA897581 Hs.128773 ESTs 
CH22_8431FG_UNK_DJ579N16.GENSCAN.19-1 

CH22.DJ579N1 6.GENSCAN. 19-1 
AI417526 Hs.184636 ESTs 
H73183 Hs.129885 ESTs 
N64702 Hs.1 941 40 ESTs 

c 6_hs gi|6381894jref| gn 1 - 441 1 4509 ex 1 5 CDSI 4.20 99 4561 

CH.06_hsgi|6381894 
W73417 Hs.1 03183 EST 

c_x_hs gi|6682532|ref| gn 7 - 255591 255672 ex 3 3 CDS1 12.94 82 22 

CH.X_hs gi|6682532 
CH22_5739FG_803_2_ CH22_FGENES.803-2 
AA780307 Hs.122156 ESTs 

CH22_2739FG_545_1_UNK_EM:AC005500.GENSCAN.436-1 

-CH22 FGENES.545 1 
AJ188739 HS.148488 ESTs 

AL037984 Hs.208982 ESTs; Weakly Similar to Mil ALU SUBFAMILY J WARNING ENTRY I!!! (H.sapiensj 
A1732515 Hs.189218 ESTs 

CH22_2546FG_510_4_UNK_EM:AC005500.GENSCAN.402-3 

CH22_FGENES.510_4 
CH22_934FG_246_5_UNK_EM:AC005500.GENSCAN.101-3 

CH22_FGENES.246_5 
CH22 1664FG 378 18 UNK EM:AC005500.GENSCAN.240-1 

CH22_FGENES.378.18 
C.7_hS gii6588001|ref| gn 7 - 571207 571274 ex 1 3 CDSI 3.34 68 4325 
CH.07_hS gi|65B8001 
ESTs 

EST singleton (not in UniGene) with exon hit 

ESTs; Weakly similar to !!!l ALU SUBFAMILY SP WARNING ENTRY II!! [H.sapiensj 
ESTs 

EST duster (not in UniGene) 
CH22_2030FG_421_31_UNK_EM:AC005500.GENSCAN.282-31 

CH22_FGENES.421 31 
CH22_3210FG_621_3_UNK_EM:AC005500.GENSCAN.513-3 

CH22_FGENES.621_3 
AA737894 Hs.29797 ribosomal protein L1 0 
AJ049878 Hs.1 33032 ESTs 
AI820675 Hs.203804 ESTs 
AI691065 Hs.155780 ESTs 
AA347566 EST cluster (not in UniGene) 



AA203415 Hs.1 36200 
AI140014 

AI587083 Hs.200558 

H73505 Hs.117874 
T73234 



1.3 
1.3 

1.3 

1.3 

1.3 
1.3 
1.3 
1.3 
1.3 
1.3 

1.3 

1.3 

1.3 
1.3 
1.3 
1.3 
1.3 

1.3 
1.3 
1.3 
1.3 

1.3 
1.3 
1.3 
1.3 
1.3 
1.3 
1.3 
1.3 
1.3 

1.3 

1.3 
1.3 
1.3 
1.3 
1.3 
1.3 

1.3 
1.3 
1.3 
1.3 

1.3 
1.3 

1,3 
1.3 
1.3 

1.3 
1.3 
1.3 
1.3 

1.3 

1.3 

1.3 

1.3 
1.3 
1.3 
1.3 
1.3 
1.3 

1.3 

1.3 
1.3 
1.3 
1.3 
1.3 
1.3 
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EOS36063 CH22 3522FG_703 2 UNK_DA59H18.GENSCAN.9-2 

CH22_FGENES.703_2 
EOS37889 CH22 6403FG_UNK_EM:AC005500.GENSCAN.98-6 

CH22_EM:AC005500.GENSCAN.98-6 
EOS05561 AA804508 EST singleton (not in UniGene) with exon hit 

EOS34847 CH22 2235FG 457 7 UNK_EM:AC005500.GENSCAN.347-1 

CH22 FGENES.457 7 
333542 EOS33473 CH22 799FGJ78.4 UNK_EM:AC005500.GENSCAN.5&-4 



336132 



337958 



305630 
334916 



1.3 



1.3 
1.3 



1.3 











CH22 FGENES.178_4 


1.3 


331151 


EOS31082 


R32331 


Hs.164599 


ESTs 


1.3 


315095 


EOS15026 


AA831815 


Hs.243788 


ESTs 


1.3 


331593 


EOS31524 


N72150 


HS.50193 


EST 


1.3 


323767 


EOS23698 


AI807408 


Hs. 166368 


ESTs 


1.3 


334561 


EOS34492 


CH22_1865FG_405_1_UNK_EM:AC005500.GENSCAN.270-5 












CH22_FGENES.405_1 


1.3 


308191 


EOS08122 


AI538878 




EST singleton (not in UniGene) with exon hit 


1.3 


319571 


EOS19502 


N91399 


HS.220826 


ESTs 


1.3 


316200 


EOS16131 


A1914535 


Hs.221377 


ESTs 


1.3 


305996 


EOS05927 


AA889338 


Hs. 163356 


EST 


1.2 


316055 


EOS17986 


AI249193 


Hs.145945 


ESTs 


1.2 


315570 


EOS15501 


AI860360 


Hs.160316 


ESTs 


1.2 


320792 


EOS20723 


AW236504 


Hs.247020 


ESTs 


1.2 


331649 


EOS31580 


W20364 


Hs.55412 


ESTs; Weakly similar to c29 [M.musculus] 


1.2 


303839 


EOS03770 


Z45939 




EST cluster (not in UniGene) with exon hit 


1.2 


324399 


EOS24330 


AA814768 


Hs.21396 


ESTs 


1.2 


317172 


EOS17103 


AJ741232 


Hs.206744 


ESTs 


1.2 


312452 


EOS12383 


A1692643 


Hs.1 72749 


ESTs 


1.2 


325482 


EOS25413 


c12_hs gi|5866957|ref] gn 3 + 47957 48078 ex 5 7 CDSi 10.25 122 1896 












CH.12_hs gj[5866957 


1.2 


311395 


EOS11326 


R23313 




EST cluster (not in UniGene) 


1.2 


336124 


EOS36055 


CH22_3513FG_701_9_UNK_DA59H18.GENSCAN.8-9 












CH22_FGENES.701_9 


1.2 


320082 


EOS20013 


AA487678 


HS.189738 


ESTs 


1.2 


312168 


EOS12099 


T92251 


Hs.198882 


ESTs 


1.2 


338000 


EOS37931 


CH22_6472FG_UNK_EM:AC005500.GENSCAN.1 1 9-5 












CH22_EM:AC005500.GENSCAN.1 1 9-5 


1.2 


338852 


EOS38783 


CH22_7705FG__UNKDJ246D7.GENSCAN.12-1 












CH22 DJ246D7.GENSCAN.12-1 


1.2 


312090 


EOS12021 


N57692 


Hs.1 18064 


ESTs 


1.2 


316480 


EOS 164 11 


AI749921 


Hs.205377 


ESTs 


1.2 



333259 EOS33190 CH22_500FG_1 18_7_UNK_EM:AC005500.GENSCAN.2-7 

CH22_FGENES.118_7 1.2 
33521 1 EOS35142 CH22_2550FG_51 1_2JJNK_EM:AC005500.GENSCAN.403-2 

CH22_FGENES.511_2 1.2 

321950 EOS21881 AA594780 Hs.172318 ESTs 1.2 
337937 EOS37868 CH22_6370FG_UNK_EM:AC005500.GENSCAN.86-1 

CH22 EM:AC005500.GENSCAN.86-1 1.2 

316576 EOS16507 AI732114 Hs.193046 ESTs; Weakty similar to Hit ALU SUBFAMILY J WARNING ENTRY HI! [H.sapiens] 1.2 

322770 EOS22701 AA045796 Hs.1 59971 SWl/SNF related; matrix associated; actin dependent regulator of chromatin; subfamily b; member 1 1 .2 
329369 EOS29300 c x_hs gi|586d642|ref] gn 1 - 121 148 121516 ex 3 4 CDSi 8.50 369 3910 

CH.X_hs gi|5868842 1.2 

304183 EOS04114 H91161 EST singleton (not in UniGene) with exon hit 1.2 

339370 EOS39301 CH22 8343FG__UNK_BA232E17.GENSCAN.1-12 

CH22_BA232E17.GENSCAN.1-12 1.2 

303941 EOS03872 AW473878 Hs.1 561 10 Immunoglobulin kappa variable 1D-8 1.2 

302245 EOS02176 H18835 EST cluster (not in UniGene) with exon hit 1.2 

335255 EOS35186 CH22_2597FG_517_2_UNK_EM:AC005500.GENSCAN.41 1-2 

CH22_FGENES.517_2 1.2 

316610 EOS16541 AW087973 Hs.126731 ESTs 1.2 

314915 EOS14846 AA573072 Hs.1 87748 ESTs; Weakly similar to HI! ALU SUBFAMILY J WARNING ENTRY I!!! [H.sapiens] 1.2 

315426 EOS15357 AI391486 Hs.128171 ESTs 1.2 
334003 EOS33934 CH22J281FG.310 28_UNK_EM:AC005500.GE NSC AN. 167-27 

CH22_FGENES.310_28 1.2 

304350 EOS04281 AA1 86871 EST singleton (not in UniGene) with exon hit 1.2 

325173 EOS25104 AI133215 Hs.144662 ESTs; Moderately similar to HI! ALU SUBFAMILY J WARNING ENTRY H!l [H.sapiens) 1.2 

312313 EOS12244 AW293341 Hs.122505 ESTs 1.2 
333366 EOS33297 CH22_612FG_142_3_UNK_EM:AC005500.GENSCAN.22-6 

CH22_FGENES.142_3 1.2 
334970 EOS34901 CH22 2291FG 466 3_UNK_EM:AC005500,GENSCAN.361-2 

CH22_FGENES.466_3 1.2 
338668 EOS38599 CH22_7441FG__UNK_EM:AC005500.GENSCAN.465-1 

CH22_EM:AC005500.GENSCAN.465-1 1.2 
336502 EOS36433 CH22_3926FG_833_8 UNK_DJ579N16.GENSCAN.5-9 

CH22_FGENES.833_8 1.2 

309438 EOS09369 AW1 02802 Hs.225787 ESTs; Moderately similar to hypothetical protein [H.sapiens] 1.2 
336194 EOS36125 CH22_3591FG_717_20_UNK_DA59H18.GENSCAN.20- 19 

CH22_FGENES.717_20 1.2 

336678 EOS36609 CH22_4156FG_43_6_ CH22.FGENES.43-6 1.2 

321401 EOS21332 W90406 Hs,35962 ESTs 1.2 

306026 EOS05957 AA902309 EST singleton (not in UniGene) with exon hit 1.2 

336434 EOS36365 CH22 3854 FG 826 1_UNK_BA232E17.GENSCAN.8-1 

CH22.FGENES.826J 1.2 

315257 EOS15188 AW157431 Hs.248941 ESTs 1.2 
328349 EOS28280 c_7_hs gi|5868383|ref| gn 7 - 260704 260804 ex 2 9 CDSi 4.37 101 621 

CH.07_hs gi|5868383 1.2 
3261 12 EOS26043 c17_hs gi|5867192|refl gn 1 +2151 2725 ex 1 1 CDS! 54.87 575 1272 
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CH.17_hsgi|5867192 1.2 

333995 EOS33926 CH22 1272FG_310J9 LINK EM:AC005500.GENSCAN.167-18 

CH22_FGENES.310J9 1.2 

_. 323683 EOS23614 AJ380045 Hs.225033 ESTs 1.2 

5 330143 EOS30074 c21 J52gi|4210430|ernb| gn 3 + 184737 184848ex 4 4 CDSI 1.71112111 

CH.21_p2 $4210430 1.2 

329789 EOS29720 c14_p2gi|6469354|emb| gn 2 - 118977 119036 ex 1 3 CDSI 1.19601517 

CH.14_p2gi|6469354 1.2 

324397 EOS24328 AA307836 Hs.1 18758 ESTs; Weakly similar to RLF (Ksapiens} 1.2 

10 308729 EOS08660 AI799766 Hs.208627 EST .1.2 

323939 EOS23870 AW499632 Hs. 1 1 5696 ESTs 1 .2 

333444 EOS33375 CH22_694FG_153_1JJNK_EM:AC005500.GENSCAN.34-1 

CH22.FGENES.153J 1.2 

306302 EOS06233 AA937901 EST singleton (not in UniGene) with exon hit 1.2 

15 313693 EOS13624 AW469160 Hs.170651 ESTs 1.2 

316652 EOS16583 AA789249 EST cluster (not in UniGene) 1.2 

332325 EOS32256 T79428 Hs.1 91264 ESTs 1 .2 

336235 EOS36166 CH22_3633FG_740_2_UNK_DA59H18.GENSCAN.44-2 

CH22.FGENES.740.2 1.2 

20 319436 EOS19367 R02750 EST duster (not in UniGene) 1.2 

312335 EOS12266 AW043620 Hs.236993 ESTs 1.2 

322109 EOS22040 AI884327 Hs.244737 ESTs 1.2 

328466 EOS28397 c_7_hs gi|5868434|ref| gn 1 - 1 5643 1 5900 ex 1 2 CDS! 2.36 258 1608 

CH.07_hs gi|5868434 1.2 

25 323244 EOS23175 T70731 EST cluster (not in UniGene) 1.2 

312510 EOS12441 AA779907 Hs.117558 ESTs 1.2 

314853 EOS14784 AA729232 Hs.153279 ESTs 1.2 

336946 EOS36877 CH22_4731 FG_355_2_ CH22.FGENES.355-2 1.2 

303874 EOS03805 AA258921 EST duster (not in UniGene) with exon hit 1.2 

f=»30 312658 EOS12589 AA730280 Hs.120936 ESTs 1.2 

w 308354 EOS08285 A1611044 EST singleton (not in UniGene) with exon hit 1.2 

iS 310073 EOS 10004 A1335004 Hs.1 48558 ESTs 1.2 

."5 324777 EOS24708 AA744046 Hs.1 33350 ESTs 1.2 

300897 EOS00828 A1890356 Hs.127804 ESTs 1.2 

= U-35 308371 EOS08302 AI620666 Hs.242510 EST 1.2 

^ 306358 EOS06289 AA961821 EST singleton (not in UniGene) with exon hit 1.2 

%J 312295 EOS12226 AA578233 Hs.1 73863 ESTs 1.2 

H 319792 EOS19723 R20317 Hs.22968 ESTs 1.2 

— 338546 EOS38477 CH22_7267FG__UNK_EM:AC005500.GENSCAN.410-1 

ili 40 CH22_EM:AC005500.GENSCAN.410-1 1.2 

^ 314546 EOS14477 AW007211 Hs.186672 ESTs 1.2 

•W 338494 EOS38425 CH22_7184FG_UNK_EMAC005500.GENSCAN.385-5 

m CH22_EM:AC005500.GENSCAN.385-5 1.2 

331131 EOS31062 R54797 Hs.26238 EST; Weakly similar to reverse transcriptase homdog [Ksapiens] 1.2 

^45 309939 EOS09870 AW419122 EST singleton (not in UniGene) with exon hit 1.2 

fQ 332932 EOS32863 CH22_153FG_38_6_UNK_C20H12.GENSCAN.29-6 

. CH22_FGENES.38_6 1.2 

309653 EOS09584 AW 196800 Hs.1 80842 ribosomal protein L1 3 1.2 

.* „ 318647 EOS18578 AI526152 EST cluster (not in UniGene) 1.2 

s /* 50 304044 EOS03975 T52479 Hs.252259 ribosomal protein S3 1.2 

i j 330307 EOS30238 c_7 _p2 gi|4877982|gb|A gn 2 + 107384 107559 ex 2 4 CDSi 9.96 176 4 

s'J CH.07_p2 gf|4877982 1.2 

*^ 314499 EOS14430 AL044570 Hs.147975 ESTs 1.2 

338053 EOS37984 CH22_6552FG_UNK_EM:AC005500.GENSCAN. 158-1 

55 CH22_EM:AC005500.GENSCAN,158-1 1.2 

332991 EOS32922 CH22.21 5FG_56_4_LINK_EM:AC000097.GENSCAN.1 7-4 

CH22_FGENES.56_4 1.2 

306308 EOS06239 AA946870 EST singleton (not in UniGene) with exon hit 1.2 

338120 EOS38051 CH22_6655FG_UNK_EM:AC005500.GENSCAN. 195-1 

60 CH22_EM:AC005500.GENSCAN.195-1 1.2 

313703 EOS13634 AI161293 Hs.146862 ESTs; Weakly similar to KIAA052 5 protein (H.sapiens] 1.2 

330563 EOS30494 U50553 Hs.1 479 16 DEAD/H (Asp-Glu-AJa-Asp/His) box polypeptide 3 1.2 

332886 EOS32817 CH22_106FG_33_7_LINK.C20H12.GENSCAN.22-9 

_ CH22_FGENES.33_7 1.2 

65 303844 EOS03775 U94362 Hs.58589 glycogenin 2 1.2 

321755 EOS21686 A1215881 Hs.144042 ESTs 1.2 

333532 EOS33463 CH22 789FG 175 19_UNK_EM:AC005500.GENSCAN.53-25 

CH22_FGENES.175_19 1.2 

332863 EOS32794 CH22 81 FG 28 3 UNK_C20H12.GENSCAN.18-3 

70 CH22_FGENES.28_3 1.2 

333254 EOS33185 CH22.495FG 118.2 UNK EM:AC005500.GENSCAN.2-2 

CH22_FGENES.118_2 1.2 

317459 EOS17390 AI367254 Hs.1 31248 ESTs 1.2 

__ 315353 EOS15284 AW452608 Hs.129817 ESTs 1.2 

75 300732 EOS00663 AI369956 Hs.257891 ESTs 1.2 

303502 EOS03433 AA488528 EST duster (not in UniGene) with exon hit 1.2 

333126 EOS33057 CH22_355FG_82_3_UNK_EM:AC000097.GENSCAN.66-10 

CH22_FGENES.82_3 1.2 

__ 332929 EOS32860 CH22_150FG_38_3_LINK_C20H12.GENSCAN.29-3 

80 CH22_FGENES.38_3 1.2 

329502 EOS29433 c10_p2 gi(3983517|gb|U gn 1 + 75 338 ex 1 1 CDSo 46.82 264 100 

CH.10_p2gi|3983517 1.2 

333408 EOS33339 CH22_657FG_145_6_UNK_EM:AC005500.GENSCAN.26-6 

CH22_FGENES.145_6 1.2 

85 315472 EOS15403 AA828850 Hs.1 65469 ESTs 1.2 

328290 EOS28221 C 7 hs gi|5668363|ref| gn 2 • 127366 127498 ex 1 5 CDSI 5.24 131 289 
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328662 


EOS28593 


319808 


EOS 19739 


303929 


EOS03860 


315712 


EOS 15643 


307391 


EOS07322 


335499 


EOS35430 


303792 


EOS03723 


327287 


EOS27218 


317713 


EOS17644 


330137 


EOS30068 


308157 


EOS08088 


314452 


EOS 14383 


308268 


EOS08199 


321467 


EOS21398 


320993 


EOS20924 


336778 


EOS36709 


319827 


EOS19758 


308249 


EOS08180 


310094 


EOS10025 


336902 


EOS36833 


339044 


EOS38975 


336675 


EOS36606 


303563 


EOS03494 


330673 


EOS30604 


311814 


EOS11745 


335481 


EOS35412 


314775 


EOS14706 


324961 


EOS24892 


313458 


EOS13389 


307074 


EOS07005 


337964 


EOS37895 


326519 


EOS26450 


337366 


EOS37297 


322340 


EOS22271 


307954 


EOS07885 


328615 


E6S28546 


317787 


EOS17718 


335288 


EOS35219 


323175 


EOS23106 


330893 


EOS30824 


306810 


EOS06741 


338239 


EOS38170 


332347 


EOS32278 


309782 


EOS09713 


322518 


EOS22449 


301187 


EOS01118 


312129 


EOS 12060 


334714 


EOS34645 


316586 


EOS16517 


320488 


EOS20419 


327456 


EOS27389 


336707 


EOS36638 


313561 


EOS13492 


330906 


EOS30837 


330987 


EOS30918 


325041 


EOS24972 


313225 


EOS13156 


305295 


EOS05226 


306896 


EOS06827 


326981 


EOS26912 


332225 


EOS32156 


318802 


EOS18733 


318413 


EOS18344 


312292 


EOS12223 


323753 


EOS23684 


313582 


EOS13513 


317836 


EOS17767 


332868 


EOS32799 


336924 


EOS36855 


327791 


EOS27722 



CH22_4367FG_159_4_ 

T62778 

Al 560998 

AW450967 Hs.235240 
CH22_4655FG_331_2_ 
CH22 7944FG_ 



CH22_4153FG_43_3_ 
AA367699 Hs.1 18787 
D57823 Hs.92962 
AW377113 Hs. 119640 



CH.07_hs gi|5868363 
c_7_hs gi|6004473|ref| gn 22 + 1 184773 1 184855 ex 7 8 CDSi 12.72 83 3916 

CH.07_hs gi|6004473 
T58960 EST cluster (not in UniGene) 

AW470753 EST singleton (not in UniGene) with exon hit 

AI950133 Hs.120882 ESTs; Moderately similar to !!!! ALU SUBFAMILY J WARNING ENTRY III! [H.sapiens] 
AI225058 EST singleton (not in UniGene) with exon hit 

CH22_2851FG_571_8_UNK_EM:AC0O5500.GENSCAN.46O.28 

CH22_FGENES.571_8 
C75094 Hs.1 99839 ESTs; Highly similar to NG22 (H.sapiens] 
c_1_hs gt|5867479|refl gn 1 - 62838 63024 ex 4 5 CDSi 11.66 1 87 1 628 

Ca01_hsgi|5867479 
AI733306 Hs.128071 ESTs 

c21 _p2 0|421O43O|emb| gn 1 • 21220 21377 ex 2 3 CDSi 1.89 158 104 

CH.21_p2gi|4210430 
AI51 0824 Hs.75968 thymosin; beta 4; X chromosome 
AL042699 Hs.209222 ESTs 
A1567509 Hs. 1 72928 collagen; type I; alpha 1 
X 1 3075 EST cluster (not in UniGene) 

AL0501 45 Hs.225986 Homo sapiens mRNA; cDNA DKFZp586C2020 (from clone DKFZp586C2020) 
CH22_FGENES. 159-4 
EST cluster (not in UniGene) 
EST singleton (not In UniGene) with exon hit 
ESTs 

CH22.FGENES.331-2 
UNK DA59H18.GENSCAN.27-5 

CH22_DA59H18.GENSCAN.27-5 
CH22.FGENES.43-3 

transforming growth factor; beta-induced; 68kD 
Sec23 (S. cerevisiae) homolog A 
ESTs; Moderately similar to zinc finger protein [H.sapiens] 
CH22.2833FG 570_10_UNK_EMJVC005500.GENSCAN.46O4 

CH22_FGENES.570_10 
AJ 149880 Hs.188809 ESTs 
AA61 3792 EST cluster (not in UniGene) 

AA007259 Hs.255853 ESTs 

Al 1 50989 EST singleton (not in UniGene) with exon hit 

CH22_6410FG_UNK_EM:AC005500.GENSCAN.100-9 

CH22_EM:AC005500.GENSCAN.100-9 
Cl9_hs gj|5867439|ref] gn 4 + 166004 166243 ex 4 5 CDSi 4.49 240 2534 

CH.19_hs gi|5867439 
CH22.5551 FG_736_1_ CH22_FGENES.736-1 
AF088076 EST cluster (not in UniGene) 

AI41 9692 EST singleton (not in UniGene) with exon hit 

c_7 hs gt|5868239|ref| gn 2 + 35214 35347 ex 3 4 CDSi 1 1.49 1 34 3651 

CH.07_hs gi|5868239 
AW339612 Hs.249364 ESTs 

CH22_2630FG_527_1_UNK_EM:AC005500.GENSCAN.421-1 

CH22_FGENES.527_1 
AI827137 Hs.1 84023 ESTs 
AA149620 Hs.71999 ESTs 

A1057294 EST singleton (not in UniGene) with exon hit 

CH22_6833FG_UNK_EM:AC005500.GENSCAN.264-5 

CH22_EM:AC005500.GENSCAN.264-5 
ESTs 

Immunoglobulin kappa variable 1 D-8 
EST cluster (not in UniGene) 
EST cluster (not in UniGene) with exon hit 
EST cluster (not in UniGene) 
CH22 2024FG 421_25_UNK_EM:AC005500.GENSCAN.282-25 

CH22 FGENES.421_25 
AI205077 Hs.144689 ESTs 
R3 1 386 EST cluster (not in UniGene) 

C 2 hs gi|6004455|ref| gn 3 + 173257 173378 ex 5 7 CDSi 4.03 122 1 184 
CH.02_hs gj|6004455 
CH22.FGENES.64-3 
EST cluster (not in UniGene) 
ESTs 

ESTs; Weakly similar to Illl ALU SUBFAMILY J WARNING ENTRY IN! [H.sapiens] 
ESTs 
ESTs 

EST singleton (not in UniGene) with exon hit 
EST singleton (not in UniGene) with exon hit 
c21_hs gi|6588016|ref| gn 3 + 105091 106038 ex 1 1 CDSo 122.69 948 567 

CH.21_hsgi|6588016 
N33213 Hs.100425 ESTs 
R19443 Hs.92414 ESTs 
AI138592 Hs.144936 ESTs 
AW451893 Hs.1 51 124 ESTs 
AA3271 02 EST cluster (not in UniGene) 

AW207684 Hs.1 3583 ESTs 
AA983913 Hs.1 28929 ESTs 
CH22 86FG 28 8 UNK C20H12.GENSCAN.ie-8 

CH22_FGENES.28_8 
CH22_4699FG_347_9_ CH22.FGENES.347-9 
c_5_hs gi|5867977|refj gn 1 + 22491 22610 ex 6 7 CDSi 1 1.29 120 658 



HS.221716 
HS.156110 



W60326 
AW275156 
Al 133446 
AA806542 
AW300867 



CH22_4212FG_64_3. 
AA040155 
AA169498 
H40988 
A1809182 
AA5023S4 
AA667131 
A1093383 



Hs.72804 
Hs.131965 
Hs.1 30907 
Hs.151529 



1.2 

1.2 
1.2 
1.2 
1.2 
1.2 

1.2 
1.2 

1.2 
1.2 

1.2 
1.2 
1.2 
1.2 
1.2 
1.2 
1.2 
1.2 
1.2 
1.2 
1.2 

1.2 
1.2 
1.2 
1.2 
1.2 

1.2 
1.2 
1.2 
1.2 
1.2 

1.2 

1.2 
1.2 
1.2 

1.2 

1.2 
1.2 

1.2 
1.2 
1.2 
1.2 

1.2 
1.2 
1.2 
1.2 
1.2 
1.2 

1.2 
1.2 
1.2 

1.2 
1.2 
1.2 
1.2 
1.2 
1.2 
1,2 
1.2 
1.2 

1.2 
1.2 
1.2 
1.2 
1.2 
1.2 
1.2 
1.2 

1.2 
1.2 
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330717 
322944 
312108 
332570 
330880 
310341 
334012 

318230 
336071 

338510 

334487 

320661 
335200 



EOS30648 
EOS22875 
EOS12Q39 
EOS32501 
EOS30811 
EOS 10272 
EOS33943 

E0S18161 
EOS36002 

EOS38441 

EOS34418 

EOS20592 
EOS35131 



333582 EOS33513 



320789 
321185 
337740 

315064 
334883 

331825 
319141 
333682 



EOS20720 
EOS21116 
EOS37671 

EOS14995 
EOS34814 

EOS31756 
EOS19072 
EOS33613 



336140 EOS36071 



320727 
323947 
324746 
306744 
326517 



EOS20658 
EOS23878 
E0S24677 
EOS06675 
E0S26448 



333597 EOS33528 



330135 EOS30066 



315118 
302893 
337169 
336121 

323332 
320911 
327990 

320425 
327075 

314384 
338716 

330886 
327331 



EOS15049 
EOS02824 
EOS37100 
E0S36052 

EOS23263 
EOS20842 
EOS27921 

EOS20356 
EOS27006 

EOS14315 
EOS38647 

EOS30817 
EOS27262 



326714 EOS26645 



316734 
311660 
312757 
331686 
337840 

332093 
319595 
315990 
322438 
332965 

337182 
334948 

325864 

337760 

315422 
338889 



EOS16665 
EOS11591 
EOS12688 
EOS31617 
EOS37771 

EOS32024 
EOS19526 
EOS15921 
EOS22369 
EOS32896 

EOS37113 
EOS34879 

EOS25795 

EOS37691 

EOS15353 
EOS38820 



AA233926 

AA1 12573 

T82331 

AA401376 

AA132420 

AW302773 



Hs.23635 

Hs.127453 

Hs.26176 

Hs.53542 



U96044 
AA649842 
AA603367 
AI031882 



CH.05 hs gi|5867977 
ESTs 

EST duster (not in UniGene) 
ESTs 
ESTs 

KIAA0986 protein 
EST cluster (not in UniGene) 
CH22 1 290FG_31 3_3 JJNK.EM: AC005500.GENSCAN. 1 69-3 

CH22_FGENES.313_3 
AA558125 EST cluster (not in UniGene) 

CH22_3457FG_685_3_UNK_DJ32l10.GENSCAN.21-6 

CH22_FGENES.685_3 
CH22_7208FG_UNK_EM:AC00S500.GENSCAN.391-22 

CH22 EM:AC005500.GENSCAN.391-22 
CH22 1786FG 395 9 UNK_EM:AC005500.GENSCAN.258-10 

CH22_FGENES.395_9 
AA864846 EST duster (not in UniGene) 

CH22.2538FG 508_9_UNK_EM:AC005500.GENSCAN.401-9 

CH22_FGENES.508_9 
CH22_842FG_201_2_UNK_EM:AC005500,GENSCAN.72-3 

CH22_FGENES.201_2 
R787 1 2 EST duster (not in UniGene) 

H51659 Hs.189854 ESTs 
CH22 6085FG_UNK_EM:AC000097.GENSCAN. 1 00-6 

CH22_EM:AC000097.GENSCAN.100-6 
AA775208 Hs. 1 36423 ESTs 

CH22_2197FG_451_6_UNK_EM:AC005500.GENSCAN.340-6 

CH22_FGENES.451_6 
AA411144 Hs.104768 ESTs 
F12377 EST cluster (not in UniGene) 

CH22 944FG 247 10 LINK EM:AC005500.GENSCAN.102-1Q 

CH22_FGENES.247_10 
CH22 3530FG 705 2 UNK_DA59H18.GENSCAN.10-2 
CH22_FGENES.705_2 
EST duster (not in UniGene) 
Hs. 186667 ESTs 
Hs.222294 ESTs 

EST singleton (not in UniGene) with exon hit 
c19_hs gi|5867439|ref| gn 1 + 44732 46356 ex 6 6 CDS1 148.22 1625 2512 

CH.19_hsgi|5e67439 
CH22_858FG_21 1_5_UNK_EMAC005500.GENSCAN.79-5 

CH22_FGENES.211_5 
c21 _p2 gi|4456470|emb| gn 2 - 121583 121885 ex 2 2 CDSf 18.67 303 102 

CH.21_p2gi|4456470 
AA564921 Hs. 143899 ESTs 

AL1 17539 Hs.173515 Homo sapiens mRNA; cDNA DKFZp586H021 (from clone DKFZp586H021) 
CH22_5189FG_563_1_ CH22_FGENES.563-1 
CH22_3510FG_701_6_UNK_DA59H18.GENSCAN.8-6 

CH22_FGENES.701_6 
AI829520 Hs,227513 ESTs 
A1056872 Hs. 133386 ESTs 

c 6 hs gi|5868218|ref| gn 2 - 36225 36503 ex 1 2 CDS1 16.35 279 1419 
CH.06 hsgi|5668218 

C14069 Hs.201627 ESTs; Moderately similar to tilt ALU SUBFAMILY SQ WARNING ENTRY !!!! [H. sapiens] 
c21 hsgi|6531965|ref|gn58 + 4041318 4041431 ex4 4CDSI 179114 1285 
CH.21_hsgi|6531965 

AA535840 Hs.1 62203 ESTs; Weakly similar to alternatively spliced product using exon 1 3A [H. sapiens] 
CH22_7502FG__UNK_EM:AC005500.GENSCAN.488-9 

CH22_EM:AC005500.GENSCAN.488-9 
AA1 35606 Hs.189384 ESTs; Weakly similar to III! ALU SUBFAMILY J WARNING ENTRY !!!! [H.sapiens] 
C_1_hs gi|5667516|ref| gn 4 - 55606 55737 ex 2 6 CDSi 7.01 132 2349 

CH.01_hsgi|5867516 
c20 hs gil5867595|ref| gn 2 + 124490 124568 ex 5 6 CDSi 0.1 1 79 1020 
CH.20 hs gj[5867595 
ESTs 
ESTs 
ESTs 
ESTs 

CH22_6223FG__UNK_EM:AC005500.GENSCAN.26-9 

CH22_EM:AC005500.GENSCAN.26-9 
ESTs 
ESTs 
ESTs 
ESTs 

CH22 189FG_50_3 UNK_EM:AC000097.GENSCAN.3-5 

CH22_FGENES.50_3 
CH22_5204FG 570 2 CH22_FGENES.570-2 
CH22 2269FG 465 15 UNK EM: AC 005500. GENS CAN. 35 9-1 3 

CH22_FGENES.465.15 
C16 hsgi|5867069|re(lgn 2- 110834 110904 ex 3 3 CDSf 9.76 71 457 

CH.16 hsgi|5867069 
CH22 6110FG__UNK_EM:AC000097.GENSCAN.116-8 

CH22 EM;AC000097.GENSCAN.116-8 
AW1 35357 Hs.1 92374 ESTs 
CH22 7746FG__UNK_DJ32I10.GENSCAN.7-1 

CH22.DJ321 1 0.GENSCAN.7-1 



AW080237 
AI978583 
AI285970 
W88502 



AA608794 
H81361 
AI800041 
W44531 



Hs.252884 
HS.232161 
Hs.183817 
Hs.182258 



Hs.1 12592 
Hs.194485 
Hs.190555 
Hs.167851 



1.2 
1.2 
1.2 
1.2 
1.2 
1.2 
1.2 

1.2 
1.2 

1.2 

1.2 

1.2 
1.2 

1.2 

1.2 
1.2 
1.2 

1.2 
1.2 

1.2 
1.2 
1,1 

1.1 

1.1 
1.1 
1.1 
1.1 
1.1 

1.1 

1.1 

1.1 
1.1 
1.1 
1.1 

1.1 
1.1 
1.1 

1.1 
1.1 

1.1 
1.1 

1.1 
1.1 

1.1 

1.1 
1.1 
1.1 
1.1 
1.1 

1.1 
1.1 
1.1 
1.1 
1.1 

1.1 
1.1 

1.1 

1.1 

1.1 
1.1 

1.1 
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332961 EOS32892 CH22 185FG_48 18 UNK EMiAC000097.GENSCAN.2- 14 

CH22_FGENES.48_18 1.1 

314703 EOS14634 AI791249 EST duster (not in UniGene) 1.1 

317791 EOS17722 AI801500 Hs.128457 ESTs 1.1 
5 333680 EOS33611 CH22 942FG_247_7_UNK_EM:AC005500.GENSCAN. 102-7 

CH22_FGENES.247„7 1.1 

322419 EOS22350 AA248987 Hs. 14084 ESTs; Highly similar to zinc RING finger protein SAG [M.musculus] 1.1 
338124 EOS38055 CH22_6661 FG_UNK_EM:AC005500.GENSCAN. 196-2 

CH22_EM:AC005500.GENSCAN. 1 96-2 1.1 

10 308884 EOS08815 AI833131 Hs.179100 ESTs 1.1 
333349 EOS33280 CH22.595FGJ 40_3_UNK_EM:AC005500.GENSCAN.20-3 

CH22_FGENES.140_3 1.1 

313150 EOS13081 AA824410 Hs. 165003 ESTs 1.1 
339208 EOS39139 CH22_8146FG_UNK_FF113D11.GENSCAN.6-3 

15 CH22_FF113D11.GENSCAN.6-3 1.1 

335653 EOS35584 CH22 301 3FG 590 4 UNK EM:AC0O550O.GENSCAN.484-4 

CH22_FGENES.590_4 1.1 

319524 EOS19455 AA682865 Hs. 194441 ESTs 1.1 

rt 301576 EOS01507 A1682905 Hs. 146875 ESTs; Weakly Similar to HI! ALU SUBFAMILY J WARNING ENTRY !!!! (H. sapiens] 1.1 

20 317598 EOS17529 AW206035 Hs. 192 123 ESTs 1.1 
333473 EOS33404 CH22 724FG_162_3_UNK_EM:AC005500.GENSCAN,42-10 

CH22_FGENES.162_3 1.1 
333949 EOS33880 CH22J 225FG_303_5_UNK_EM:AC005500.GENSCAN. 162-9 

CH22_FGENES.303_5 1.1 
25 339256 EOS39187 CH22_8207FG_UNK_BA354I12.GENSCAN.7-11 

CH22_BA354I12.GENSCAN.7-1 1 1.1 
332884 EOS3281 5 CH22J 04FG_33_5_UNK_C20H12.GENSCAN.22-7 

CH22_FGENES.33_5 1.1 

314660 EOS14591 AA436007 Hs.188780 ESTs 1.1 
..=30 333220 EOS33151 CH22_457FG_104_12_UNK_EM:AC000097.GENSCAN. 108-11 

CH22_FGENESJ04_12 1,1 

i.fj 308106 EOS08037 AI476803 EST singleton (not in UniGene) with exon hit 1.1 

~Z 320709 EOS20640 AA456660 Hs.154165 ESTs 1.1 

307612 EOS07543 AI290787 EST singleton (not in UniGene) with exon hit 1.1 

2,35 330286 EOS30217 c_5_p2gij6671913|gb|Agn2-31050 31171 ex27CDSi 8.84 122 791 

*Zf CH.05_p2gi|6671913 1.1 

.Q 304495 EOS04426 AA446448 EST singleton (not in UniGene) with exon hit 1.1 

.r^ 310583 EOS10514 AW205632 Hs.211198 ESTs 1.1 

332896 EOS32827 CH22_117FG_35_10_UNK_C20H12.GENSCAN.24-9 

H40 CH22.FGENES.35J0 1.1 

337602 EOS37533 CH22_5895FG_UNK_C20H12.GENSCAN.15-1 

W CH22_C20H12.GENSCAN.15-1 1.1 

.r. 307626 EOS07557 A1300035 EST singleton (not in UniGene) with exon hit 1.1 

r 334696 EOS34627 CH22_2006FG_421_5_UNK_EM:AC005500.GENSCAN.282-5 

1^45 CH22_FGENES.421_5 1.1 

ffi 318652 EOS18583 T53259 EST cluster (not in UniGene) 1.1 

337844 EOS37775 CH22_6229FG_UNK_EM:AC005500.GENSCAN.30-9 

f? =Ss CH22_EM:AC005500.GENSCAN.30-9 1 . 1 

~=;i 334823 EOS34754 CH22_2137FG_437_5_UNK_EM:AC005500.GENSCAN.301-7 

CH22.FGENES.437.5 1.1 
if 5 * 333928 EOS33859 CH22 1201FG 299 2 UNK_EM:AC005500.GENSCAN. 158-5 

- 6= * CH22_FGENES.299_2 1.1 

?=- 337503 EOS37434 CH22_5738FG_803_1_ CH22_FGENES.803-1 1,1 

„ 323044 EOS22975 AA148725 Hs.154190 ESTs 1.1 
55 329164 EOS29095 C x_hs gj|5868691|ref| gn 1 + 62305 6251 7 ex 2 2 CDS1 17.51 213 1868 

CH.X_hs gi|5868691 1.1 
335468 EOS35399 CH22_2619FG_567 4.UNK EM:AC005500.GENSCAN.454-12 

CH22_FGENES.567_4 1.1 
338962 EOS38893 CH22_7838FG_UNK_DJ32M0.GENSCAN.23-39 

60 CH22.DJ32I10.GENSCAN.23-39 1.1 

323570 EOS23501 AL038623 Hs.208752 ESTs; Weakly similar to ALU SUBFAMILY SX WARNING ENTRY IN! [H.sapiens] 1.1 
333568 EOS33499 CH22_826FG_185_1_UNK_EM:AC005500.GENSCAN.64-1 

CH22.FGENES. 185,1 1.1 

331865 EOS31796 AA425756 Hs.98445 ESTs 1.1 
65 336246 EOS36177 CH22_3644FG_746_5_UNK_DA59H18.GENSCAN.484 

CH22_FGENES.746_5 1.1 

337238 EOS37169 CH22_5343FG_641_3_ CH22.FGENES.641-3 1.1 

305089 EOS05020 AA642622 EST singleton (not in UniGene) with exon hit 1.1 

„ 300097 EOS00028 AI916973 Hs.213603 ESTs 1.1 

70 313134 EOS13065 N63406 Hs.258697 ESTs 1.1 

337452 EOS37383 CH22 5665FG 775 J _ CH22.FGENES.775-1 1.1 
325433 EOS25364 c12_hs gi|5866936|ref| gn 4 • 480706 480826 ex 3 4 CDSi 1 .99 121 818 

CH.12_hspj|5866936 1.1 
335999 EOS35930 CH22_3380FG 657.1 UNK DJ246D7.GENSCAN.11-1 

75 CH22_FGENES.657_1 1.1 

333580 EOS33511 CH22_840FG_199_2_UNK_EM:AC005500.GENSCAN.71-2 

CH22_FGENES.199_2 1.1 

336836 EOS36767 CH22 4512FG_247_11_ CH22.FGENES.247-1 1 1.1 
334677 EOS34608 CH22_1986FG_418_30_UNK_EM:AC005500.GENSCAN. 279-31 

80 CH22_FGENES.418_30 1.1 

329062 EOS28993 C x_hs gj|5866590|ref) gn 3 - 58977 59094 ex 4 11 CDSi -6.19 1 18 627 

CH.X_hs gi|5868590 1,1 
333671 EOS33602 CH22_932FG_245_5_UNK_EM:AC005500.GENSCAN.100-12 

CH22_FGENES.245_5 1.1 

85 304941 EOS04872 AA612612 EST singleton (not in UniGene) with exon hit 1.1 

315772 EOS15703 AW515373 Hs.158893 ESTs 1.1 
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EOS00080 
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EOS37738 
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EOS27736 


55 
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60 
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65 
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336722 
334220 


EOS15606 
EOS36653 
EOS34151 


l\J 


336703 
336397 


EOS36634 
EOS36328 


75 


316105 
334661 


EOS16036 
EOS34592 
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333997 


EOS07714 
EOS33928 


80 
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328249 
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323561 
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335916 


EOS23492 
EOS01395 
EOS35847 



AA843986 Hs.190586 ESTs 1.1 
CH22_777FG_1 74_3_UNK_EM:AC005500.GENSCAN.53-6 

CH22_FGENES.174_3 1.1 
AI559820 Hs.199438 ESTs 1.1 
AW025517 Hs.1 33250 ESTs 1.1 
AA972165 Hs.1 50308 ESTs 1.1 
CH22 6028FG_UNK EM:AC000097.GENSCAN.78-12 

CH22_EM:AC000097.GENSCAN.78-12 1.1 
N24830 yx70a02.s1 Soares melanocyte 2NbHM Homo sapiens cDNA clone IMAGE:267050 3' similar to 

gb|M87912|HUMALNE562 Human carcinoma cell-derived Alu RNA transcript, (rRNA);contains AJu 

repetitive element;, mRNA sequence. 1 . 1 

CH22_3859FG_827.4_UNK_DJ579N16.GENSCAN.1-3 

CH22_FGENES.827_4 1.1 
c20_hs gi|6682509|refj gn 2 - 167988 168179 ex 4 4 CDSf 18.69 192 2238 

CH.20_hs gi|6682509 1.1 
CH22.4793FG 380 9_ CH22_FGENES.380-9 1.1 
CH22_1260FG_310 7 UNK_EM:AC005500.GENSCAN.167-5 

CH22_FGENES.310_7 1.1 
c_7_hs gj{6552423|ref) gn 1 + 105580 105774 ex 6 7 CDSi 2.91 195 6195 

CH.07_hs gi|6552423 1.1 
D83777 Hs.75137 K1AA01 93 gene product 1.1 
A1826999 Hs.224624 ESTs 1.1 
c14_hs gi|6682483|ref] gn 1 - 129273 130754 ex 1 1 COSo 1 1,82 1482 2225 

CH.14_hsgi|6682483 1.1 
N52510 Hs.1 86470 ESTs 1.1 
CH22_3069FG_599_8_UNK_EM:AC005500.GENSCAN.490-11 

CH22_FGENES.599_8 1.1 
AW502257 EST cluster (not in UniGene) 1 , 1 

CH22_6391 FG_UNK_EM:AC005500.GENSCAN.94-1 

CH22_EM:AC005500.GENSCAN.94-1 1.1 
CH22_3313FG_646_6_UNK_DJ246D7.GENSCAN.1-5 

CH22_FGENES.646_6 1.1 
CH22_2233FG_457_3_UNK_EM:AC005500.GENSCAN.346-2 

CH22.FGENES.457.3 1.1 

AW1 50648 Hs.75621 protease inhibitor 1 (anti-elastase); alpha- 1 -antitrypsin 1.1 

AW368520 Hs.24639 ESTs 1 . 1 

AA094436 Hs.1 557 12 follistatirHike 1 1.1 
CH22 926FG 244 1 UNK EM:AC005500.GENSCAN.99-1 

CH22_FGENES.244_1 1.1 
CH22_3235FG_629_5_UNK_EMAC005500.GENSCAN.519-4 

CH22_FGENES.629_5 1.1 

A1682536 Hs.1 63495 ESTs 1 . 1 

AW448916 Hs.149018 ESTs 1.1 

AI028162 Hs.132307 ESTs 1.1 
CH22.61 78FG__UNK_EM:ACOQ5500.GENSCAN.9-4 

CH22_EMAC005500.GENSCAN.9-4 1.1 

CH22_4688FG_346_4 CH22 FGENES.346-4 1.1 

CH22_5722FG_799_2_ CH22.FGENES.799-2 1.1 

T92107 Hs.188489 ESTs 1.1 
CH22J99FG_51_10_UNK_EM:AC000097.GENSCAN.4-12 

CH22_FGENES.51_I0 1.1 
c_5_hs gi|5867968|ref| gn 2 + 19952 20019 ex 1 2 CDSI 9.47 68 988 

CH.05_hs gi|5867968 1.1 
CH22_8153FG__UNK_FF1 13D1 t.GENSCAN.6-10 

CH22_FF113D11.GENSCAN.6-10 1.1 

T69279 EST cluster (not in UniGene) 1.1 

AA827082 EST cluster (not in UniGene) 1.1 

CH22_697FG_154_5_UNK_EM:AC005500.GENSCAN.35-6 

CH22 FGENES.154 5 1.1 
CH22.481FGJ 1 1_6_UNK_EM:AC000097.GENSCAN. 120-5 

CH22 FGENES.111 6 1.1 
CH22 7343FG_UNK EM:AC005500.GENSCAN.437-2 

CH22_EM:AC005500.GENSCAN.437-2 1.1 
c16_p2 gi|4567166|gb|A gn 2 + 72861 73052 ex 1 3 CDSf 18.02 192 590 

CH.16_p2gi|4567166 1.1 

AA652272 Hs.197320 ESTs 1.1 

CH22_4245FG_84_2_ CH22.FGENES.84-2 1.1 
CH22_1503FG_359_4_UNK_EM:AC005500.GENSCAN.217-7 

CH22.FGENES.359 4 1.1 

CH22_4201FG_56_3_ CH22 FGENES.56-3 1 1 
CH22_3812FG_823_12_UNK_BA232E17.GENSCAN.6-11 

CH22 FGENES.823 12 1.1 

AW295687 Hs.254420 ESTs 1 1 
CH22 J 969FG.4 1 8_9_UNK_EM:AC005500.GENSCAN.279-1 3 

CH22_FGENES.418_9 1.1 

AI347274 EST singleton (not in UniGene) with exon hit 1.1 

CH22.1 275FG.31 0_22_UNK_EM:AC005500,GENSCAN.1 67-2 1 

CH22_FGENES.310_22 1.1 

AA436673 Hs.29417 Homo sapiens mRNA; cDNA DKFZp586B0323 (from clone DKFZp586B0323) 1.1 
c_6_hs gi|6381891|ref| gn 2 - 96352 96527 ex 2 3 CDSi 6.19 176 4550 

CH.06 hsgi|6381891 1.1 
CH22_6849FG__UNK_EM:AC0055O0.GENSCAN.270-1 

CH22 EM:AC005500.GENSCAN.270-1 1.1 

AA825426 Hs.238832 ESTs; Weakly similar to II!! ALU SUBFAMILY J WARNING ENTRY 1!!! [H.sapiens] 1.1 

AA991519 Hs.253324 ESTs 1 1 
CH22_3293FG_636_12_UNK_EMiAC005500.GENSCAN.526-12 
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CH22_FGENES.636_12 
321828 EOS21759 X56197 EST cluster (not in UniGene) 

327413 EOS27344 c 2 hs gtj 5867750] ref] gn 3 + 101410 101508 ex 4 5 CDSi 4.34 99 587 

CH.02_hs gi|5867750 

334474 EOS34405 CH22 1773FG__394_5_UNK_EM:AC005500.GENSCAN.257-5 

CH22_FGENES.394_5 
EOS36670 CH22 4291FG 117_3_ CH22.FGENES. 117-3 
EOS16448 Al 7843 15 Hs. 1231 63 ESTs 

EOS25450 C12 hsgt|6017036|refj gn 5- 186804 186915 ex 1 3 CDSI 8.36112 2508 

CH.12_hsgi|6017036 
EOS33806 CH22J 145FG 291.1 1_UNK_EM:AC005500.GENSCAN.14^6 

CH22_FGENES.291_11 
EOS38152 CH22_6797FG_UNK_EM:AC005500.GENSCAN.246-10 

CH22_EM:AC005500.GENSCAN.246-1 0 
EOS36809 CH22_4617FG_318_5_ CH22_FGENES.318-5 
EOS37850 CH22_6338FG_UNK_EM:AC005500.GENSCAN.66-5 

CH22_EM:AC005500.GENSCAN.66-5 
EOS09759 AW293999 EST singleton (not in UniGene) with exon hit 

EOS05190 AA679225 EST singleton (not in UniGene) with exon hit 

EOS33853 CH22J 1 95FG_296_1 3_UNK_EM:AC005500.GENSCAN. 1 55-1 6 

CH22_FGENES.296_13 
EOS22023 AF085833 EST cluster (not in UniGene) 

EOS13287 AI266254 Hs.132929 ESTs 
EOS1 8778 Z42908 Hs. 1 2308 ESTs 
EOS37106 CH22_5195FG_567J_ CH22.FGENES.567-1 
EOS36910 CH22_4802FG_385_4_ CH22.FGENES.385-4 
EOS12100 AI064824 Hs. 193385 ESTs 
EOS36129 CH22 3595FG_719_2_UNK_DA59H18.GENSCAN.21-2 

CH22_FGENES.719_2 

EOS21879 AA309612 Hs. 118797 ubiquitin-conjugating enzyme E2D 3 (homologous to yeast UBC4/5) 
EOS24623 . AA557952 EST duster (not in UniGene) 

EOS30326 D1 0923 Hs. 137555 putative chemokine receptor; GTP-bcnding protein 
EOS33050 CH22_347FG_80_4_UNK_EM:AC000O97.GENSCAN.65-4 

CH22_FGENES.80_4 
EOS15943 AA764950 Hs.119898 ESTs 
EOS00073 AI743419 Hs.205707 ESTs 
EOS17146 AW014242 Hs.159998 ESTs 

EOS29457 c10_p2 gi| 39 83506 |gbjU gn 2 + 12251 12325 ex 3 3 CDSI 7.37 75 178 

CH.10_p2gj|3983506 
EOS17340 AA764968 Hs.4864 KIAA0892 protein 
EOS39161 CH22_8 1 7 1 FG_UNK_BA3541 1 2.GENSCAN. 1 -6 

CH22_BA3541 1 2.GENSCAN. 1 -6 
EOS 11 529 AW023595 Hs.232048 ESTs 

EOS39095 CH22_8091 FG UNK_DA59H 1 8.GENSCAN.69-4 

CH22_DA59H18.GENSCAN.69-4 
326725 EOS26656 C20_hs gi|6552456|ref] gn 2 • 223005 223125 ex 5 6 CDSi 6.10 121 1038 

CH.20_hs gi|6552456 
EOS30883 H02855 Hs.29567 ESTs 

EOS34552 CH22.1 928FG_412_4_UNK_EM:AC005500,GENSCAN.275-4 

CH22.FGENES.412.4 
EOS01616 W67730 EST cluster (not in UniGene) with exon hit 

EOS0871 2 AI81 1 707 EST singleton (not in UniGene) with exon hit 

EOS23344 AA248828 Hs. 22 5676 ESTs 

EOS06654 AI026 1 51 EST singleton (not in UniGene) with exon hit 

EOS31189 Z41777 Hs.27413 ESTs 
EOS1 2959 AI355433 Hs. 1 90856 ESTs 
EOS32933 CH22 226FG 59_3_UNK_EM:AC000097.GENSCAN.21-3 

CH22.FGENES.59_3 
EOS02942 AF090405 EST cluster (not in UniGene) with exon hit 

EOS17618 AA972990 Hs.127904 ESTs 

EOS26710 c 7 hs gi|5868309lref| gn 4 + 41 570 41 639 ex 1 5 CDSf 2.65 70 5365 

CH.07_hs gi|5868309 
EOS38638 CH22 7487FG_UNK_EM:AC005500.GENSCAN.482-2 

CH22_EM:AC005500.GENSCAN.482-2 
EOS37905 CH22 6427FG_UNK_EM;AC005500.GENSCAN. 106-3 

CH22_EM:AC005500.GENSCAN.106-3 
EOS32785 CH22_71FG_22_1_UNK_C20H12.GENSCAN.15-2 

CH22_FGENES.22_1 
EOS11156 AW451982 Hs.248613 ESTs 
EOS37025 CH22 5018FG_465_19_ CH22_FGENES.465-19 
EOS19288 F13425 Hs.26229 ESTs 
EOS32889 CH22 182re_48_15_UNK.EM:AC000097.GENSCAN.2-11 

CH22_FGENES.48_15 
EOS09565 AW193825 EST singleton (not in UniGene) with exon hit 

EOS21102 AI769410 Hs.221461 ESTs 
EOS16371 AI954795 Hs.156135 ESTs 
EOS1 1 596 AW294254 Hs.223742 ESTs 

EOS27479 c_3_hs gj[5867797lref| gn 2 - 81 067 81 1 30 ex 3 7 CDSi 6.42 64 1 2 

CH.03_hs gi|5867797 
EOS14871 AW452768 Hs.162045 ESTs 

EOS26332 C19 hs gi|5867355|ref| gn 1 + 35165 35332 ex 9 1 1 CDSi 0.41 1 68 788 

CH.19_hs gi|5867355 
336347 EOS36278 CH22 3759FG_815_3_UNK_BA232El7.GENSCAN.1-24 

CH22 FGENES.815.3 

322297 EOS22228 W76548 Hs.136026 ESTs; Moderately similar to (111 ALU SUBFAMILY SC WARNING ENTRY till [H.sapiensl 
309977 EOS09908 AW451663 EST singleton (not in UniGene) with exon hit 
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CH22_717FGJ61_2_UNK_EM:AC005500.GENSCAN.42-2 

CH22_FGENES.161_2 
C x_hs gi|5868693|ref( gn 2 + 67924 6801 9 ex 6 8 COS) 3.30 96 1 882 

CH.X.hs gi|5868693 
C!0_p2 gi|3983526|gb|A gn 3 * 7425 7561 ex 1 3 CDSI 4.33 1 37 22 

CH.10_p2gi|3983526 
c20_hs gij6552455|ref| gn 1 + 146726 146838 ex 11 11 CDSI 1.84 113 767 

CH.20_hs gi|6552455 
K06538 Hs.12270 ESTs 
W23986 Hs.34578 alpha2;3-sialyltransf erase 
C_4_hs gi|5867847|refi gn 1 - 169293 169362 ex 2 3 CDS) -0.28 70 782 

CK04_hs gi|5867847 
CH22_8405FG_UNK_DJ579N1 6.GENSCAN.5-6 

CH22_DJ579N16.GENSCAN.5-8 
AA918274 Hs.76067 heat shock 27kD protein 1 
D59968 EST cluster (not in UniGene) 

c12_hs gi| 5866941 1 ref| gn 3 - 372480 372621 ex 2 3 CDSi 9.16 142 1026 

CH.12_hs gi|5866941 
AI064724 Hs.228468 ESTs 

c16_p2gi|51()3803|gb|Agn3+188050188193ex8 8CDSI 2.01 144 361 

CH.16_p2gi|5103803 
AA632817 Hs.1 90316 ESTs 
CH22_8262FG_UNK_BA354l12.GENSCAN.21-3 

CH22_BA354l12.GENSCAN.21-3 
AI078483 Hs.1 34549 ESTs 
AL120518 Hs.105352 ESTs 

AA31 1443 Hs.251416 Homo sapiens mRNA; cDNA DKFZp586E2317 (from clone DKF2p586E2317) 
CH22_3200FG_620_1_UNK_EM;AC005500.GENSCAN.512-1 

CH22.FGENES.620J 
CH22_41 55FG_43_5_ CH22.FGENES.43-5 
c19_p2 gil6015314|gb|A gn 1 - 5768 5635 ex 4 9 CDSi 2.88 68 162 

CH.19_p2gi'l6015314 
CH22 8272FG_UNK BA354I12.GENSCAN.22-11 

CH22_BA354I12.GENSCAN.22-1 1 
W221 52 EST cluster (not in UniGene) 

CH22_76FG_24_1_UNK_C20H12.GENSCAN.1fr6 

CH22.FGENES.24J 

AA648355 Hs. 1 851 55 ESTs; Weakly similar to echinoderm microtubule-associated protein-like EMAP2 [H.sapiens] 
CH22_219FG_58_2_UNK_EM:AC000097.GENSCAN.19-2 

CH22.FGENES.58_2 
CH22_691FG_151_5_UNK_EM:AC005500.GENSCAN.32-5 

CH22_FGENES.151_5 
CH22_748FG_168_6_UNK_EM:AC005500.GENSCAN.47-5 

CH22_FGENES.168_6 
CH22_8123FG_UNK_DA59H18.GENSCAN.72-16 

CH22_DA59H18.GENSCAN.72- 1 6 
CH22_4818FG_397_7_ CH22.FGENES. 397-7 
AW298359 Hs.221069 ESTs 
AW015736 Hs.211378 ESTs 
AI470235 Hs.1 72698 EST 

CH22 3062FG_599_1 _UNK_EM:AC005500.GENSCAN.490-2 

CH22_FGENES.599_1 
AW062570 Hs.13809 ESTs 

W93278 EST singleton (not in UniGene) with exon hit 

AI20221 1 EST singleton (not in UniGene) with exon hit 

CH22_1344FG_327_21_UNK_EM:AC005500.GENSCAN.181-23 

CH22_FGENES.327_21 
C21_hs gi|6531965|ref| gn 18 • 1380806 1381443 ex 1 5 CDSI 30.65 638 943 

CH.21_hsgi|6531965 
c17_hs gi|5867176iref| gn 1 + 70854 70915 ex 6 8 CDSi -1.46 62 127 

CH.17_hs giJ5867176 
c14 hsgi|5866996|refign 28 - 981751 981849 ex 1 10 CDSI 1.46 99101 

CH.14_hs gi|5866996 
T81429 EST cluster (not in UniGene) 

CH22 1589FG 372 4 UNK EM:AC005500.GENSCAN.232-5 

CH22_FGENES.372_4 
AA203135 Hs.1 301 86 ESTs 

AA81 5426 EST singleton (not in UniGene) with exon hit 

AI334078 Hs.1 52438 ESTs 
AI589618 Hs.1 924 13 ESTs 

c21_hs gi|6531965|ref] gn 24 - 1924026 19241 10 ex 2 6 CDSi 9.43 85 1012 

CH.21_hs gi|6531965 
AW450376 Hs.1 30803 ESTs 

Al 1 44243 EST singleton (not in UniGene) with exon hit 

AF077208 EST cluster (not in UniGene) 

c19_hs gi|5867362|refl gn 3 - 45283 45375 ex 3 3 CDSI 5.65 93 923 

CH.19_hs gi|5867362 
CH22_122lFG_303_1_UNK_EM:AC005500.GENSCAN.162-5 

CH22_FGENES.303_1 
AW299534 EST cluster (not in UniGene) 

C17 j>2 gi|6478962|gb|A gn 3 + 75145 75287 ex 3 3 CDSI -2.56 143 150 

CH.17_p2 gj|6478962 
CH22 5896FG_UNK_C20H12.GENSCAN.16-2 

CH22 C20H12.GENSCAN.16-2 
CH22 134FG_36_18_UNK_C20H12.GENSCAN,28-17 

CH22_FGENES.36_18 
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1.1 
1.1 

1.1 
1.1 
1.1 
1.1 

1.1 
1.1 

1.1 

1.1 
1.1 

1.1 
1.1 

1.1 

1.1 

1.1 

1.1 
1.1 
1.1 
1.1 
1.1 

1.1 
1.1 
1.1 
1.1 

1.1 

1.1 

1.1 

1.1 
1.1 

1.1 
1.1 
1.1 
1.1 
1.1 

1.1 
1.1 
1.1 
1.1 

1.1 

1.1 
1.1 

1.1 

1.1 

1.1 
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310026 EOS09957 T24895 Hs. 100691 ESTs 1.1 
330153 EOS30084 c21_p2 gi|4325335|gb|A gn 2 + 146951 147475 ex 2 2 CDSI 25.45 525 233 

CH.21_p2 gi|4325335 1.1 
3341 18 EOS34049 CH22_1396FG_330_19_UNK_EM:AC00550O.GENSCAN. 185-20 

5 CH22_FGENES.330_19 1.1 

324795 EOS24726 AI494481 Hs.1 41579 ESTs 1.1 

332530 EOS32461 M31682 Hs.1735 inhibin; beta B (activin AB beta polypeptide) 1.1 

332048 EOS31979 AA496019 Hs.201591 ESTs 1.1 
t ^ 334532 EOS34463 CH22J 834FG_402J3_UNK_EM:AC005500.GENSCAN.266-13 

10 CH22_FGENES.402.13 1.1 

329762 EOS29693 c14_p2 gi|6048280|emb| gn 3 + 127744 127878 ex 2 4 CDSi 11.66 135 1054 

CH.14_p2gi|6048280 1.1 
332909 EOS32840 CH22J 30FG_36_13_UNK_C20H1 2.GENSCAN.28-10 

. CH22.FGENES.36_1 3 1.1 

15 321253 EOS21184 A1699484 EST cluster (not in UniGene) 1.1 

336572 EOS36503 CH22 4007FG 843 12 LINK DJ579N16.GENSCAN.15-13 

CH22 FGENES.843_12 1.1 
328768 EOS28699 C_7_hs gi|6017031 fref] gn 5 - 223741 224238 ex 1 1 CDSo 30.00 498 5285 

CH.07 hsgi|6017031 1.1 
20 334335 EOS34266 CH22J 627FG_375_12_UNK_EMlAC005500.GENSCAN.235- 12 

CH22_FGENES.375_12 1.1 
334063 EOS33994 CH22J 341 FG_327_1 7_UNK_EM:AC005500.GENSCAN.1 81-20 

CH22_FGENES.327_17 1.1 
„ 333011 EOS32942 CH22.235FG 61_3 UNK_EM:AC000097.GENSCAN.23-3 

25 CH22_FGENES.61_3 1 . 1 

304677 EOS04608 AA548071 EST singleton (not in UniGene) with exon hit 1.1 

31 3948 EOS1 3879 AW452823 Ks.1 35268 ESTs 1 . 1 
334358 EOS34289 CH22J 652FG_378_1JJNK_EM:AC005500.GENSCAN.239-1 

CH22_FGENES.378_1 1.1 
„30 328479. EOS28410 C_7_hs gij5868449|ref| gn 1 - 331 560 ex 1 31 CDSi 18.51 230 2100 

Q CH.07_hs gi|5868449 1.1 

. ~= 335813 EOS35744 CH22_3185FG_618J_UNK_EM:AC005500.GENSCAN.510-1 

CH22_FGENES.618_1 1.1 

%H - 312430 EOS12361 AW139117 Hs.117494 ESTs 1.1 

T35 324783 EOS24714 AA640770 EST cluster (not in UniGene) 1.1 

337776 EOS37707 CH22_6132FG_UNK_EM:AC000097.GENSCAN.119-18 

ri CH22_EM:AC000097,GENSCAN,119-18 1.1 

1% 327205 EOS27136 c_1_hs gi j 5867447| ref] gn 5 + 167335 167576 ex 9 9 CDS1 15.50 242 259 

CH.01_hsgS|5867447 1.1 

iffj40 315198 EOS15129 AI741506 Hs.186753 ESTs; Weakly similar to !!!! ALU SUBFAMILY J WARNING EMTRY III! [H.sapiens] 1.1 
;Z! 336135 EOS36066 CH22_3525FG_704_3_UNK_DA59H18.GENSCAN.9-5 

Q CH22_FGENES704_3 1.1 

318558 EOS18489 AW402677 Hs.90372 ESTs 1.1 
s __, 328152 EOS28083 c_6_hs gi| 58680 60| re f| gn 1 - 73981 74203 ex 1 8 CDSI 31.69 223 341 1 

? e l45 CH.06 hs gi|5868060 1.1 

lZ 330211 EOS30142 c 5_p2 gi{6013592|gb|A gn 1 +59158 59215 ex 2 4 CDSi 4.20 58 184 

CH.05_p2gi[6013592 1.1 
339280 EOS39211 CH22 8234FG_UNK BA354I12.GENSCAN.14-12 

5 ~ CH22.BA354I12.GENSCAN.14-12 1.1 

=^50 332045 EOS31976 AA491253 Hs. 155045 bromodomain adjacenl to zinc finger domain; 2A 1.1 

313597 EOS13528 AW162263 Hs.249990 ESTs 1.1 
329503 EOS29434 d0j>2 gi|3983517|gb|U gn 2 - 1801 1937 ex 1 4 CDSI 4.33 137 101 

m CH.10_p2gi|3983517 1.1 

333488 EOS33419 CH22 740FG 167.3 UN K_EM:AC005500.GENSCAN. 46-10 

5 5 CH22.FGENES. 1 67_3 1 . 1 

311960 EOS11891 AW440133 Hs. 189690 ESTs 1.1 

320590 EOS20521 U67056 Hs. 1681 02 Human proteinase activated receptor- 2 mRN A; 3' UTR 1.1 
334047 EOS33978 CH22J 325FG 326 5 UNK EM:AC005500.GENSCAN.175-5 

, _ CH22.FGENES.326.5 1 . 1 

60 304782 EOS04713 AA582081 EST singleton (not in UniGene) with exon hit 1.1 

324231 EOS24162 W60827 EST cluster (not in UniGene) 1.1 

327212 EOS27143 cj_hs gi| 5867463] ref j gn 1 - 42308 42424 ex 5 1 3 CDSi 6.58 1 17 325 

CH.01_hs gi|5867463 1.1 
335857 EOS35788 CH22_3232FG_629_1_UNK_EM:AC005500.GENSCAN.519-1 

65 CH22_FGENES.629_1 1.1 

317775 EOS17706 AA974603 Hs.181123 ESTs 1.1 

33 1 053 EOS30984 N70242 Hs. 1 831 46 ESTs 1 . 1 
335940 EOS35871 CH22_3318FG_646_13_UNK_DJ246D7.GENSCAN.1-12 

CH22_FGENES.646_13 1.1 

70 322568 EOS22499 W87342 Hs.209652 ESTs 1.1 

314091 EOS14022 A1253112 Hs.133540 ESTs 1.1 

313570 EOS13501 AA041455 Hs.209312 ESTs 1.1 

300967 EOS00896 AA565209 Hs. 1902 16 ESTs 1.1 

„ 314544 EOS14475 AA399018 Hs.250835 ESTs 1.1 
75 328321 EOS28252 C_7_hs gJ|5868373Jref| gn 7 - 1 029614 1029673 ex 1 3 CDSI -2.40 60 448 

CH.07_hs gi|5868373 1.1 

310979 EOS10910 AW445166 Hs.170802 ESTs 1.1 

310730 EOS10661 AJ 9 39421 Hs. 160900 ESTs 1.1 

318471 EOS18402 AW137725 Hs.146874 ESTs 1.1 

80 315533 EOS15464 AW206191 Ks.152774 ESTs 1.1 
325751 EOS25682 cH.hs gi 1 66824 74 1 ref | gn 4 + 130437 130520 ex 6 7 CDSi 0.22 84 1666 

CH.14_hsgil6682474 1.1 

318780 EOS 18711 R90906 Hs. 11 3307 ESTs 1.1 

313271 EOS13202 AW444819 Hs.144851 ESTs; Weakly similar to C09F5.2 [C.elegans] 1.1 

85 304546 EOS04477 AA486074 EST singleton (not in UniGene) with exon hit 1.1 

330618 EOS30549 X55990 Hs. 73839 ribonuclease; RNase A family; 3 (eosinophil cationic protein) 1.1 
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332931 EOS32862 CH22 152FG 36 5 UNK C20H12.GENSCAN.29-5 

CH22.FGENES.38_5 
336602 EOS36533 CH22_4047FG_372 4_UNK_EM:AC005500.GENSCAN.232-4 

CH22.FGENES.372.4 
AI638294 Hs.224665 ESTs 
CH22_5873FG__UNK C20H12.GENSCAN.5-3 

CH22_C20H12.GENSCAN.5-3 
AW071751 Hs.13179 ESTs; Moderately Similar to III! ALU SUBFAMILY SO WARNING ENTRY !!!! [H.sapiens] 
AA410183 Hs.137475 ESTs 
AI373163 Hs.170333 ESTs 

CH22J 245FG 307_4_UNK_EM:AC005500.GENSCAN. 165-5 

CH22_FGENES.307_4 
Al 1 87742 HS.1 25562 ESTs 

AI608947 EST singleton (not in UniGene) with exon hit 

c17_hsgi|5867254|ref|gn6- 112000 112137 ex24CDSi 8.01 138 1952 
CH.17_hsgi|5667254 
336023 EOS35954 CH22 3406FG 669 12 UNK DJ32J10.GENSCAN.9- 17 

CH22_FGENES.669_12 
AA278246 EST duster (not in UniGene) 

CH22_3477FG_689_2_UNK_DJ32l10.GENSCAN.23-20 

CH22_FGENES.669_2 
AW237220 Hs.211130 ESTs 

CH22_2409FG_488_4_UNK_EM:AC005500.GENSCAN.384-6 

CH22 FGENES.488_4 
AW148940 Hs.248647 EST 
H49160 Hs. 133472 ESTs 

T97905 EST cluster (not in UniGene) with exon hit 

AI832201 Hs.2 11469 ESTs 
R08673 Hs. 17751 4 ESTs 

C14_02 gi|6672062|emb| gn 2 + 33990 34098 ex 3 4 COSi 9.1 1 109 2222 

CH.14_p2gi|6672062 
AB01 9571 EST duster (not in UniGene) with exon hit 

CH22.1 731 FG_385_8_UNK_EM:AC005500.GENSCAN.249-6 

CH22_FGENES.385_8 
AA57781 6 EST singleton (not in UniGene) with exon hit 

CH22_513FG_121_1_UNK_EM:AC005500.GENSCAN.4-11 
CH22.FGENES.121J 
EST 
ESTs 

ESTs; Weakly similar to cDNA EST yk414c9.3 comes from this gene [C.elegans] 
ESTs 

EST duster (not in UniGene) with exon hit 
ESTs 
ESTs 

EST duster (not in UniGene) 
ribosoma! protein; large; PO 
CH22_1153FG_292_4_UNK_EM:AC005500.GENSCAN.150-4 

CH22_FGENES.292_4 
CH22_5960FG_UNK_EM:AC000097.GENSCAN.10-8 

CH22_EM:AC000097.GENSCAN.10-8 
CH22_2983FG_584_2_UNK_EM:AC005500.GENSCAN.478-2 
CH22_FGENES.584_2 
ESTs 
ESTs 
ESTs 
ESTs 

spinocerebellar ataxia 7 (divopontocerebellar atrophy with retinal degeneration) 
ESTs 

EST cluster (not in UniGene) with exon hit 
ESTs 

c_7_hs gi|5868330|ref| gn 1 + 90446 90602 ex 3 4 CDSi 10.20 157 5634 

CH.07_hsgi|5868330 
T90622 Hs.82609 hydroxymethylbilane synthase 
AI420742 Hs.163502 ESTs 
N53480 Hs.108622 ESTs 
AA564740 Hs.258401 ESTs 

HG2730-HT2827 Fibrinogen, A Alpha Polypeptide, AIL Splice 2, E 

AF067797 EST duster (not in UniGene) with exon hit 

c21_p2 gi|4210430|emb| gn 1 - 22334 22460 ex 3 3 CDSf 16.56 127 105 
CH.21_p2gi|4210430 
332952 EOS32883 CH22.1 76FG_48_8_UNK_EM:AC000097.GENSCAN.2-4 

CH22J=GENES.48_8 
T77136 Hs.8765 RNA helicase-related protein 
AA411263 Hs.128783 ESTs 
CH22 3625FG 730 2_UNK_DA59H18.GENSCAN.36-2 

CH22_FGENES.730_2 
AI833168 Hs. 184507 Homo sapiens Chromosome 16 BAC clone CIT987SK-A-328A3 
AW296132 Hs.166674 ESTs 
CH22_8326FG_UNK_BA354I12.GENSCAN.31-1 

CH22_BA3541 1 2.G ENSCAN . 3 1 - 1 
AW502125 EST cluster (not in UniGene) 

F11330 Hs. 177633 ESTs 
Y13323 Hs. 145296 dtsintegrin protease 
CH22_6944FG_UNK_EM:AC005500.GENSCAN.304-2 

CH22_EM:AC005500.GENSCAN.304-2 
333964 EOS33895 CH22J 241 FG_305_2_UNK_EM:AC005500.GENSCAN. 164-2 



311185 
337585 

310249 
314578 
310750 
333968 

316133 
308337 
326160 



323479 
336090 

311192 
335081 

309519 
321172 
301976 
323012 
319528 
329838 

302623 
334433 

304747 
333270 

307054 
320764 
321523 
322114 
303582 
322924 
311179 
318601 
309791 
333882 



EOS11116 
EOS37516 

EOS10180 
EOS14509 
EOS10681 
EOS33899 

EOS16064 
EOS08268 
EOS26091 



EOS23410 
EOS36021 

EOS11123 
EOS35012 

EOS09450 
EOS21103 
EOS01907 
EOS22943 
EOS19459 
EOS29769 

EOS02554 
EOS34364 

EOS04678 
EOS33201 

EOS06985 
EOS20695 
EOS21454 
EOS22045 
EOS03513 
EOS22855 
EOS11110 
EOS18532 
EOS09722 
EOS33813 



337645 EOS37576 



335623 EOS35554 



314745 
330790 
332071 
312005 
330694 
330739 
303042 
323091 
328820 

300472 
310645 
332238 
300966 
330437 
302292 
330138 



319901 
321166 
336227 

302332 
313800 
339356 

324512 
319235 
320352 
338316 



EOS14676 
EOS30721 
EOS32002 
EOS 11 936 
EOS30625 
EOS30670 
EOS02973 
EOS23022 
EOS28751 

EOS00403 
EOS10576 
EOS32169 
EOS00897 
EOS30368 
EOS02223 
EOS30069 



EOS19632 
EOS21097 
EOS36158 

EOS02263 
EOS13731 
EOS39287 

EOS24443 
EOS19166 
EOS20283 
EOS38247 



AI148181 

R73070 

H78472 

AA643791 

AA377444 

AA669253 

A1880843 

T39921 

AW276176 



AA564489 

T48536 

AA598594 

T78450 

AA019806 

AA293477 

AF129532 

AW014094 



Hs.176835 
Hs.246927 
Hs.191325 
HS.191740 

Hs.193971 
Hs 223333 

Hs.73742 



Hs.137526 

Hs. 105807 

Hs.112475 

Hs.13941 

Hs.108447 

HS.227591 

HS.210761 
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1.1 
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1.1 
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1.1 
1.1 
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1.1 
1.1 
1.1 
1.1 

1.1 
1.1 
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1.1 
1.1 
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312758 
338178 

315199 
312321 
338765 

330547 
315368 
328691 



EOS12689 
EOS38109 

EOS15130 
EOS12252 
EOS38696 

EOS30478 
EOS15299 
EOS28622 



329179 EOS29110 
327072 EOS27003 



312056 
339128 

307646 
319198 
338556 

306143 
332384 
325100 
309839 
312180 
330385 
315882 
325843 

330783 
317224 
316042 
333524 

302357 
309830 
321489 
312304 
322026 



EOS11987 
EOS39059 

EOS07577 
EOS19129 
EOS38487 

EOS06074 
EOS32315 
EOS25031 
EOS09770 
EOS12111 
EOS30316 
EOS15813 
EOS25774 

EOS30714 
EOS17155 
EOS 15973 
EOS33455 

EOS02288 
EOS09761 
EOS21420 
EOS12235 
EOS21957 



CH22_FGENES.305_2 
AA721107 Hs.202604 ESTs 
CH22_6726FG_UNK_EM:AC005500.GENSCAN.21 9-6 

CH22_EM:AC005500.GENSCAN.21 9-6 
AA877996 Hs. 125376 ESTs 
R66210 Hs.186937 ESTs 
CH22 7588FG_UNK_EM:AC005500.GENSCAN.518-1 

CH22_EM:AC005500.GENSCAN.518-1 
U32989 Hs. 183671 tryptophan 2;3-dioxygenase 
AW291563 Hs.152495 ESTs 

c 7 hs gi|6588001 |ref| gn 7 - 579598 579664 ex 2 3 COSi 12.78 67 4326 

CH.07_hsgi|6588001 
c_x_hs gi|5868704|ref| gn 2 + 181639 181815 ex 3 4 CDSi 0.32 177 1939 

CH.X_hs gi|5868704 
c21 hs gi|6531965[ref| gn 55 • 3796429 3797197 ex 4 4 CDSf 9.33 769 1270 

CH.21_hs gi|6531965 
T83748 Hs.189712 ESTs 
CH22_8046FG_UNK_DA59Hl8.GENSCAN.55-2 

CH22_DA59H18.GENSCAN.55-2 
AI302236 EST singleton (not in UniGene) with exon hit 

F07354 EST cluster (not in UniGene) 

CH22 7283FG_UNK_EM:AC005500.GENSCAN.41 7-8 

CH22_EM:AC005500.GENSCAN.41 7-8 
EST singleton (not in UniGene) with exon hit 
retinol-binding protein 1; cellular 

ESTs; Weakly similar to coded for by C. elegans cDNA yk30b3.5 [C.eJegans] 
EST singleton (not in UniGene) with exon hit 
ESTs 

ESTs; Highly similar to secreted apoptosis related protein 1 [H .sapiens] 
ESTs 

c16_hs gij6552453|ref| gn 1 - 7126 7232 ex 1 3 CDSI 1.87 107 182 

CH.16_hsgi|6552453 
D60050 Hs.34812 ESTs 
D56760 Hs.8122 ESTs 
AW297979 Hs.170698 ESTs 

CH22 781FG 175 10 LINK EM:AC005500.GENSCAN.S3-15 
CH22_FGENES.175_10 

group-specific component (vitamin D binding protein) 
EST singleton (not in UniGene) with exon hit 

ESTs; Moderately similar to !!!! ALU SUBFAMILY SQ WARNING ENTRY !!!! [Ksapiens] 
ESTs 

low density lipoprotein receptor (familial hypercholesterolemia) 



AA916314 

Ml 1433 

T10265 

AW296076 

AI248285 

AA449749 

AI831297 



Hs.101850 
Hs.116122 

Hs.11 8348 

Hs.31386 

Hs.123310 



X03176 Hs.198246 
AW294725 

AW392474 Hs.172759 

AA491949 Hs. 183359 

AA233527 Hs.213289 
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1.1 
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1.1 
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1.1 
1.1 
1.1 
1.1 
1.1 

1.1 
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1.1 
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Table 2 provides the nucleic acid and protein sequence of the CBF9 gene as well as the 
Unigene and Exemplar accession numbers for CBF9. 

5 TABLE 2 CBF9 DNA and Protein Sequences 

CBF9 DNA sequence 

Gene name: ESTs 

Unigene number: Hs. 157601 

Probeset Accession #: W07459 
10 Nucleic Acid Accession #: AC005383 

Coding Sequence: 328-2751 (underlined sequences correspond to start and 

stop codons) 

15 1 11 21 31 41 51 

« i I ' I I I I 

"2 GACAGTGTTC GCGGCTGCAC CGCTCGGAGG CTGGGTGACC CGCGTAGAAG TGAAGTACTT 60 

•tU TTTTATTTGC AG AC C TGGGC CGATGCCGCT TTAAAAAACG CGAGGGGCTC TATGCACCTC 12 0 

v3 CCTGGCGGTA GTTCCTCCGA CCTCAGCCGG GTCGGGTCGT GCCGCCCTCT CCCAGGAGAG 180 

hQO ACAAACAGGT GTCCCACGTG GCAGCCGCGC CCCGGGCGCC CCTCCTGTGA TCCCGTAGCG 240 

=?S CCCCCTGGCC CGAGCCGCGC CCGGGTCTGT GAGTAGAGCC GCCCGGGCAC CGAGCGC TGG 3 00 

TCGCCGCTCT CCTTCCGTTA TATCAACATG CCCCCTTTCC TGTTGCTGGA GGCCGTCTGT 3 60 

'ij GTTTTCCTGT TTTCCAGAGT GCCCCCATCT CTCCCTCTCC AGGAAGTGCA TGTAAGCAAA 420 

§Lj GAAACCATCG GGAAGATTTC AGCTGCCAGC AAAATGATGT GGTGCTCGGC TGCAGTGGAC 480 

|T25 ATCATGTTTC TGTTAGATGG GTCTAACAGC GTCGGGAAAG GGAGCTTTGA AAGGTCCAAG 540 

CACTTTGCCA TCACAGTCTG TGACGGTCTG GACATCAGCC CCGAGAGGGT CAGAGTGGGA 600 

GCATTCCAGT TCAGTTCCAC TCCTCATCTG GAATTCCCCT TGGATTCATT TTCAACCCAA 660 

Q CAGGAAGTGA AGGCAAGAAT CAAGAGGATG GTTTTCAAAG GAGGGCGCAC GGAGACGGAA 720 

SO CTTGCTCTGA AATACCTTCT GCACAGAGGG TTGCCTGGAG GCAGAAATGC TTCTGTGCCC 7 80 

1^30 CAGATCCTCA TCATCGTCAC TGATGGGAAG TCCCAGGGGG ATGTGGCACT GCCATCCAAG 840 

V CAGCTGAAGG AAAGGGGTGT CACTGTGTTT GCTGTGGGGG TCAGGTTTCC CAGGTGGGAG 900 

!J GAGCTGCATG C AC TGGC C AG C G AGCC TAG A GGGCAGCACG TGCTGTTGGC TGAGCAGGTG 960 

GAGGATGCCA CCAACGGCCT CTTCAGCACC CTCAGCAGCT CGGCCATCTG CTCCAGCGCC 1020 

H ACGCCAGACT GCAGGGTCGA GGCTCACCCC TGTGAGCACA GGACGCTGGA GATGGTCCGG 1080 

35 GAGTTCGCTG GCAATGCCCC ATGC TGG AG A GGATCGCGGC GGACCCTTGC GGTGCTGGCT 1140 

GCACACTGTC CCTTCTACAG C TGG AAG AG A GTGTTCCTAA CCCACCCTGC CACCTGCTAC 1200 

AGGACCACCT GCCCAGGCCC CTGTGACTCG CAGCCCTGCC AGAATGGAGG CACATGTGTT 12 60 

CCAGAAGGAC TGGACGGCTA CCAGTGCCTC TGCCCGCTGG CCTTTGGAGG GGAGGCTAAC 132 0 

TGTGCCCTGA AGCTGAGCCT GGAATGCAGG GTCGACCTCC TCTTCCTGCT GGACAGCTCT 1380 

40 GCGGGCACCA CTCTGGACGG CTTCCTGCGG GCCAAAGTCT TCGTGAAGCG GTTTGTGCGG 1440 

GCCGTGCTGA GCGAGG AC TC TCGGGCCCGA GTGGGTGTGG CCACATACAG CAGGGAGCTG 1500 

CTGGTGGCGG TGCCTGTGGG GG AG T AC C AG GATGTGCCTG ACCTGGTCTG GAGCCTCGAT 1560 

GGCATTCCCT TC C G TGG TGG CCCCACCCTG ACGGGCAGTG CCTTGCGGCA GGCGGCAGAG 162 0 

CGTGGCTTCG GGAGCGCCAC CAGGACAGGC CAGGACCGGC CACGTAGAGT GGTGGTTTTG 1680 

45 CTCACTGAGT CACACTCCGA GGATGAGGTT GCGGGCCCAG CGCGTCACGC AAGGGCGCGA 1740 

GAGCTGCTCC TGCTGGGTGT AGGCAGTGAG GCCGTGCGGG CAGAGCTGGA GGAGATCACA 1800 

GGCAGCCCAA AGC ATGTGAT GGTCTACTCG GATCCTCAGG ATCTGTTCAA CCAAATCCCT 1860 

GAGCTGCAGG GGAAGCTGTG CAGCCGGCAG CGGCCAGGGT GCCGGACACA AGCCCTGGAC 192 0 

CTCGTCTTCA TG TTGG AC AC CTCTGCCTCA GTAGGGCCCG AGAATTTTGC TCAGATGCAG 1980 

50 AGCTTTGTGA GAAGCTGTGC CCTCCAGTTT GAGGTGAACC CTGACGTGAC ACAGGTCGGC 2040 
CTGGTGGTGT ATGGCAGCCA GGTGCAGACT GCCTTCGGGC TGG AC AC C AA ACCCACCCGG .2100 

GCTGCGATGC TGCGGGCCAT TAGCCAGGCC CCCTACCTAG GTGGGGTGGG CTCAGCCGGC 2160 

ACCGCCCTGC TGCACATCTA TGACAAAGTG ATGACCGTCC AGAGGGGTGC CCGGCCTGGT 222 0 

GTCCCCAAAG CTGTGGTGGT GCTCACAGGC GGGAGAGGCG CAGAGGATGC AGCCGTTCCT 22 80 

55 GCCCAGAAGC TGAGGAACAA TGGCATCTCT GTCTTGGTCG TGGGCGTGGG GCCTGTCCTA 2340 

AG TG AGGGTC TGCGGAGGCT TGCAGGTCCC CGGGATTCCC TGATCCACGT GGCAGCTTAC 2400 

GCCGACCTGC GGTACCACCA GGACGTGCTC ATTGAGTGGC TGTGTGGAGA AGCCAAGCAG 2460 

CCAGTCAACC TCTGCAAACC CAGCCCGTGC ATGAATGAGG GCAGCTGCGT CCTGCAGAAT 2 52 0 

GGGAGCTACC GCTGCAAGTG TCGGGATGGC TGGGAGGGCC CCCACTGCGA GAACCGTGAG 2 580 

60 TGGAGCTCTT GCTCTGTATG TGTGAGCCAG GGATGGATTC TTGAGACGCC CCTGAGGCAC 2 640 
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ATGGCTCCCG TGCAGGAGGG CAGCAGCCGT ACCCCTCCCA GC AAC T AC AG AGAAGGCCTG 2700 

GGC AC TG AAA TGGTGC C T AC CTTCTGGAAT GTCTGTGCCC CAGGTCCT TA G AATGTC TGC 27 60 

TTCCCGCCGT GGCCAGGACC ACTATTCTCA CTGAGGGAGG AGGATGTCCC AACTGCAGCC 282 0 

ATGCTGCTTA GAGACAAGAA AGCAGCTGAT GTCACCCACA AACGATGTTG TTGAAAAGTT 2880 

5 TTGATGTGTA AG TAAATACC CACTTTCTGT ACCTGCTGTG CCTTGTTGAG GCTATGTCAT 2940 

CTGCCACCTT TCCC TTGAGG ATAAACAAGG GGTCCTGAAG ACTTAAATTT AGCGGCC TGA 3000 

CGTTCCTTTG CACACAATCA ATGC TCGCC A GAATGTTGTT GACACAGTAA TGCCCAGCAG 3060 

AGGCCTTTAC TAGAGCATCC TTTGGACGGC GAAGGCCACG GCCTTTCAAG ATGGAAAGCA 312 0 

GCAGCTTTTC CACTTCCCCA GAGACATTCT GGATGCATTT GCATTGAGTC TGAAAGGGGG 3180 

10 CTTGAGGGAC GTTTGTGACT TCTTGGCGAC TGCCTTTTGT GTGTGGAAGA G AC TTGG AAA 3240 

GGTCTCAGAC TGAATGTGAC CAATTAACCA GC TTGGTTGA TGATGGGGGA GGGGC TGAGT 3 300 

TGTGCATGGG CCCAGGTCTG GAGGGCCACG TAAAATCGTT CTGAGTCGTG AGCAGTGTCC 33 60 
ACCTTGAAGG TCTTC 

15 CBF9 Protein sequence 

Gene name: ESTs 
Unigene number: Hs . 157601 

Protein Accession #: none found 

Signal sequence: 1-17 
20 Transmembrane domains: none found 

VGW domains: 49-223; 341-518; 529-706 

EGF domains: 298-333; 715-748 

*3 Cellular Localization: plasma membrane. 
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v * 1 11 21 31 41 51 

m \ l I l l l 

p MPPFLLLEAV CVFLFSRVPP SLPLQEVHVS KETIGKISAA SKMMWCSAAV DIMFLLDGSN 60 

SVGKGSFERS KHFAITVCDG LDISPERVRV GAFQFSSTPH LEFPLDSFST QQEVKARIKR 12 0 

^0 MVFKGGRTET ELALKYLLHR GLPGGRNASV PQILIIVTDG KSQGDVALPS KQLKERGVTV 180 

\* FAVGVRFPRW EELHALASEP RGQHVLLAEQ VEDATNGLFS TLSSSAICSS ATPDCRVEAH 240 

13 PCEHRTLEMV REFAGNA PC W RGSRRTLAVL AAHCPFYSWK RVFLTHPATC YRTTCPGPCD 3 00 

•£ SQPCQNGGTC VPEGLDGYQC LCPLAFGGEA NCALKLSLEC RVDLLFLLDS SAGTTLDGFL 360 

lf~; RAKVFVKRFV RAVLSEDSRA RVGVATYSRE LLVAVPVGEY QDVPDLVWSL DG I PFRGGPT 420 

^5 LTGSALRQAA ERGFGSATRT GQDRPRRVW LLTESHSEDE VAGPARHARA RELLLLGVGS 480 

7^ EAVRAELEE I TGSPKHVMVY SDPQDLFNQI PELQGKLCSR QR PGCRTQAL DLVFMLDTSA 540 

H SVGPENFAQM QSFVRSCALQ FEVNPDVTQV GLWYGSQVQ TAFGLDTKPT RAAMLRAISQ 600 

APYLGGVGSA GTALLHI YDK VMTVQRGARP GVPKAVWLT GGRGAEDAAV P AQKLRNNG I 660 

.?h SVLWGVGPV L S EGLRRL AG PRDSLIHVAA YADLRYHQDV LIEWLCGEAK QPVNLCKPSP 720 

^40 CMNEGSCVLQ NGSYRCKCRD GWEGPHCENR EWSSCSVCVS QGWILETPLR HMAPVQEGSS 7 80 

RTP PSNYREG LGTEMVPTFW NVCAPGP 
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